Example 02: Using Thrust library#

Overview#

This example demonstrates how setuptools_cuda can be used in conjunction with Thrust library. We highly recommend that you first read the saxpy example description first, as here we will only focus on the important differences.

In this example, we will create a simple sort function that will sort a numpy array using thrust::sort on a CUDA enabled device. Internally, we will have to handle the transfer of data from numpy to the device, and then in the opposite way. Our signature will therefore look as follows

def sort(data):
   ...

As previously, we will use Cython to make our life easier.

Directory structure#

├── pyproject.toml
├── setup.py
├── test
│   └── test_thrustcu.py
└── thrustcu
    ├── thrustcu_impl.cu
    ├── thrustcu_impl.h
    └── thrustcu.pyx

The directory structure in this example is pretty similar to the one in the first example, and hence we will not discuss the role of each individual file and instead we’ll focus on the relevant contents.

Main file of the extension#

Compared to the previous example, this time we do not implement our own kernel. Instead, we rely on thrust to perform the heavy lifting. The source code for the main file of the extension looks as follows:

#include <thrust/copy.h>
#include <thrust/device_vector.h>
#include <thrust/sort.h>


template <typename T>
void _sort(T* data, long long n)
{
  thrust::device_vector<T> data_vec(data, data+n);
  thrust::sort(data_vec.begin(), data_vec.end());
  thrust::copy(data_vec.begin(), data_vec.end(), data);
}

template void _sort(float*, long long);
template void _sort(double*, long long);
template void _sort(int8_t*, long long);
template void _sort(uint8_t*, long long);
template void _sort(int16_t*, long long);
template void _sort(uint16_t*, long long);
template void _sort(int32_t*, long long);
template void _sort(uint32_t*, long long);
template void _sort(int64_t*, long long);
template void _sort(uint64_t*, long long);

We start by including several thrust files:

  • thrust/copy.h for copying data between host and device.

  • thrust/device_vector.h for definition of a device vector, which is a structure similar to a vector in standard C++ library.

  • thrust/sort.h for the actual implementation of parallel sorting.

As previously, we use templating to allow usage of several data types in our function. As to the _sort function itself, it performs the following operations:

  • Creates a device_vector data_vec. By passing it a range of pointers, we initialize the device data to the contents of the original array.

  • Sorts the created device_vector.

  • Copies the data from the device vector back to the host array passed as the argument.

Similarly to the first example, we explicitly instantiate _sort template with several different data types. The fact that we used more data types then previously will be explored later on.

We also create a header file for the _sort function, which contains its declaration.

Cython file#

Our Cython file shares many similarities to the one from the first example. What’s different is that we use numpy types in the humber fused type:

# distutils: language=c++
cimport numpy as np

ctypedef fused number:
    double
    float
    np.int8_t
    np.uint8_t
    np.int16_t
    np.uint16_t
    np.int32_t
    np.uint32_t
    np.int64_t
    np.uint64_t


cdef extern from "thrustcu_impl.h":
    void _sort[T](T* data, long long n);

def sort(number[::1] data):
    _sort(&data[0], len(data))

Note that here we use cimport (it’s not a typo, there’s c there). The cimport instruction is used for importing stuff from another Cython module. If we used import numpy instead, all of those dtypes would be simply treated as Python objects and wouldn’t work in a fused type declaration.

Installing and running the tests#

We once again remind you that before installing the package using setuptools_cuda you should define a CUDAHOME environmental variable pointing to your CUDA installation location.

Similarly to the first example, the package can be installed by running

pip install .

from the examples/02_thrust directory, and the tests can be launched by running:

pytest

from the same directory.