Zero Code Change Acceleration: familiar interfaces and high performance

Tim Head

Friday 15:35 in Ferrum

The interfaces defined by libraries like Numpy, pandas or scikit-learn are the defacto standard APIs in each library's domain. Data scientists use these libraries directly as well as indirectly through libraries that depend on them.

This talk will look at the different approaches that recent efforts have taken to give users both a familiar interface and GPU acceleration. This means users do not have to rewrite their code, learn a new library and benefit from acceleration when using existing libraries.

The cudf team built a pandas accelerator by diving deep into the import system of Python. By hooking into the import system you can replace the result of import pandas as pd with a library that uses cudf where possible and falls back to pandas where necessary. This is great as it means the user’s seaborn code is automatically accelerated, without the user or the seaborn authors having to do anything.

The polars team collaborated with the cudf team to build a GPU engine for polars. They chose a tightly coupled approach where the cuDF library hooks in after the Polars Intermediate Representation is created from the user input. This means using the GPU engine is seamless from the point of view of the user, there is transparent fallback to the CPU and no import statements need changing.

The scikit-learn team is adding experimental support to handle PyTorch and CuPy inputs by using the array API standard. Instead of using the Numpy API to perform array computations, scikit-learn is switching to using the array API. This is a subset of the Numpy API that is supported by several other array libraries. The API of Numpy and PyTorch is similar but not exactly the same, this makes writing code that works with both hard. The array API addresses this problem by providing a unified API. Users can accelerate their scikit-learn code by passing in a CuPy or PyTorch array instead of a Numpy array.

Tim Head

I am a scikit-learn core maintainer and work at NVIDIA.

Before working on scikit-learn I helped build mybinder.org and worked on JupyterHub.

Many years ago I was a particle physicist at CERN in Geneva.