How Narwhals is silently bringing pandas, Polars, DuckDB, PyArrow, and more together

Marco Gorelli

Friday 10:55 in Zeiss Plenary (Spectrum)

Suppose you want to write a data science tool to do feature engineering. Your experience may go like this:

  • Expectation: you can focus on state-of-the art techniques for feature engineering.
  • Reality: you keep having to make you codebase more complex because a new dataframe library has come out and users are demanding support for it.

Or rather, it might have gone like that in the pre-Narwhals era. Because now, you can focus on solving the problems which your tool set out to do, and let Narwhals handle the subtle differences between different kinds of dataframe inputs!

Narwhals is a lightweight and extensible compatibility layer between dataframe libraries. It is already used by several open source libraries including Altair, Marimo, Plotly, Scikit-lego, Vegafusion, and more. You will learn how to use Narwhals to build dataframe-agnostic tools.

This is a technical talk aimed at tool-builders. You'll be expected to be familiar with Python and dataframes. We will cover:

  • 2-3 minutes: motivation. Why are there so many dataframe libraries?
  • 2-3: minutes: life before vs after Narwhals - real-world examples of how the data landscape is changing
  • 7-8 minutes: basics of Narwhals, wrapping native objects, expressions vs Series, lazy vs eager
  • 7-8 minutes: advanced Narwhals concepts: row order, non-elementary group-by aggregations, multi-indices, null values, backwards-compatibility promises
  • 2-3 minutes: what comes next?
  • 5 minutes: engaging Q&A / awkward silence

Tool builders will benefit from the talk by learning how to build tools for modern dataframe libraries without sacrificing support for foundational classic libraries such as pandas.

Marco Gorelli