Python is a versatile and powerful programming language known for its simplicity and readability. However, when it comes to performance-critical tasks, Python’s interpreted nature can sometimes be a limiting factor. While Python offers ease of use and rapid development, it may not always be the best choice for computationally intensive tasks.

This is where Cython comes into play. Cython is a superset of Python that allows you to write Python code that can be compiled to highly efficient C code. By using Cython, we can combine the best of both worlds - the simplicity and expressiveness of Python with the performance of low-level languages like C. Cython allows us to write code that can be compiled to C extensions, resulting in significant performance improvements.

In this article, we will explore how to leverage Cython to write fast, compiled extensions for Python, enabling us to boost performance in critical areas of our code. We will first review some features of Cython, and then we’ll walk through an example to demonstrate how you can write compiled extensions using Cython.

Important Features of Cython

Cython offers several features that make it a powerful tool for writing high-performance extensions for Python:

Static Type Declarations

By declaring static types for variables and function arguments, Cython enables the code to be compiled to efficient C or C++ code. This allows for better optimization and faster execution compared to dynamic typing in Python.

Direct C-Level Interoperability

Cython provides direct access to C libraries and data structures, allowing seamless integration with existing C or C++ codebases. This feature is particularly useful when working with performance-critical tasks that require low-level control.

Efficient Memory Management

Cython includes features for manual memory management, such as C-style memory allocation and deallocation. This allows developers to optimize memory usage and reduce overhead associated with Python’s garbage collector.

Easy Python Integration

Cython can seamlessly call Python functions and use Python libraries, making it easy to leverage existing Python code. This feature allows developers to write performance-critical sections of code in Cython while utilizing the extensive ecosystem of Python libraries.

Support for Python Debugging

Cython supports Python’s debugging capabilities, including exceptions, tracebacks, and profiling tools. This allows developers to benefit from Python’s powerful debugging ecosystem while working with Cython code.


Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

Comparing Speed Between Python and Cython

To illustrate the performance difference between Python and Cython, let’s consider a simple example where we calculate the sum of squares of numbers from 1 to N.

First you need to install the Cython package via the following pip command:

pip install Cython

Next we import the cython module and define two functions, one written in Python (sum_of_squares_vanilla) and the other using Cython (sum_of_squares_cython). We’ll save this code in a .pyx file named sum_of_squares.pyx.

import cython

def sum_of_squares_vanilla(n):
    result = 0
    for i in range(n):
        result += i ** 2
    return result


cpdef sum_of_squares_cython(int n):
    cdef int result = 0
    cdef int i
    for i in range(n):
        result += i ** 2
    return result

Let’s break down the Cython implementation in the above code and highlight some key aspects.

In the Cython function sum_of_squares_cython, there are a few notable differences compared to the Python implementation:

The cpdef keyword

In Cython, we use the cpdef keyword instead of def to define a function. This keyword indicates that the function can be called from both Python and Cython, providing the flexibility of easy Python integration and efficient C-level execution.

Static type declarations

Inside the sum_of_squares_cython function, we have static type declarations for the variables result and i. By declaring the types as int, we are informing Cython to treat these variables as integers, allowing for more efficient C-level execution.

The cdef keyword

We use the cdef keyword to declare the variables result and i with static types. This keyword is specific to Cython and allows for efficient memory allocation and direct C-level access.

To convert the above code into a Cython extension, we need to compile it. To do so create setup.py file in the same folder where we created the sum_of_squares.pyx. The setup.py file should contain the following script.

from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules=cythonize('sum_of_squares.pyx')
)

Run the above script using the following command:

python setup.py build_ext --inplace

By running this script, we instruct distutils to compile the Cython code into a compiled extension module. The compiled module will have a .pyd file extension on Windows or a .so file extension on Unix-like systems, representing the binary extension module. In addition, you will see a C language file generated which contains the above code in C language.

files for compiling cython code

Let’s now test the performance of both the methods we defined in the sum_of_squares.pyx.

In the following code, we import the functions sum_of_squares_vanilla and sum_of_squares_cython from the compiled extension (sum_of_squares.pyd). We then use the Python timeit module to measure the execution time of each function by calling them with the argument 10000 for a sufficient number of iterations.

import timeit
from sum_of_squares import sum_of_squares_vanilla, sum_of_squares_cython


#Measure the execution time of the Python function

python_time = timeit.timeit("sum_of_squares_vanilla(10000)",
                            globals=globals(),
                            number=10000)

print("Python execution time:", python_time)


#Measure the execution time of the Cython function

cython_time = timeit.timeit("sum_of_squares_cython(10000)",
                            globals=globals(),
                            number=10000)

print("Cython execution time:", cython_time)


speedup_factor = python_time / cython_time

print("Speedup factor:", speedup_factor)

When we run the above code, we obtain the following output:

Python execution time: 16.361194699999942
Cython execution time: 0.03972609999982524
Speedup factor: 411.85001044834297

As we can see, the Cython implementation is significantly faster than the pure Python version. In this case, the Cython implementation achieves a speedup factor of approximately 411, meaning it is about 411 times faster than the pure Python implementation.

The substantial speed improvement is due to Cython’s ability to generate optimized C or C++ code from the Cython code, resulting in faster execution.

In conclusion, Cython is a valuable tool for writing fast, compiled extensions for Python. It allows developers to combine the simplicity and expressiveness of Python with the performance of compiled languages. By utilizing Cython, developers can optimize critical sections of their code, achieve substantial speed improvements, and seamlessly integrate with existing Python libraries and codebases.


Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit