Zero Code Change Acceleration: familiar interfaces and high performance

Tim Head

Wednesday 17:50 in Ferrum

The interfaces defined by libraries like Numpy, pandas or scikit-learn are the defacto standard APIs in each library's domain. Data scientists use these libraries directly as well as indirectly through libraries that depend on them.

This talk will look at the different approaches that recent efforts have taken to give users both a familiar interface and GPU acceleration. This means users do not have to rewrite their code, learn a new library and benefit from acceleration when using existing libraries.

The cuml team built a scikit-learn accelerator by diving deep into the import system of Python. By hooking into the import system you can replace the result of import sklearn with a library that uses cuml where possible and falls back to scikit-learn where necessary.

The scikit-learn team is adding experimental support to handle PyTorch and CuPy inputs by using the array API standard. Instead of using the Numpy API to perform array computations, scikit-learn is switching to using the array API. This is a subset of the Numpy API that is supported by several other array libraries. The API of Numpy and PyTorch is similar but not exactly the same, this makes writing code that works with both hard. The array API addresses this problem by providing a unified API. Users can accelerate their scikit-learn code by passing in a CuPy or PyTorch array instead of a Numpy array.

Tim Head

I am a scikit-learn core maintainer and work at NVIDIA.

Before working on scikit-learn I helped build mybinder.org and worked on JupyterHub.

Many years ago I was a particle physicist at CERN in Geneva.