The Foundation Model Revolution for Tabular Data

Noah Hollmann, Frank Hutter

Friday 15:35 in Titanium3

TabPFN shows how foundation model concepts can advance tabular data analysis in Python. Born as research published at ICLR 2023, it found strong community adoption with 1,200+ GitHub stars and 100,000+ downloads. Our upcoming January 2025 release introduces major improvements in speed, scale and capabilities that we're excited to preview at PyCon.

Detailed Outline:

  1. Context & Evolution (5 min)
  • The challenge of applying deep learning to tabular data
  • Learning from the foundation model revolution in text and vision
  • Key improvements from V0 to V1 based on community feedback
  • Real-world examples where TabPFN shines (and where it doesn't)
  1. Technical Insights (8 min)
  • How we adapted transformers for tabular data
  • Making in-context learning work for structured data
  • Performance characteristics and resource requirements
  • Understanding current limitations and constraints
  1. Live Coding & Integration (12 min)
  • Getting started with TabPFN in 3 lines of code
  • Handling real-world data challenges:
    • Missing values and mixed data types
    • Built-in uncertainty estimation
    • Working with similar tasks efficiently
  • Integration with pandas, scikit-learn and the Python ecosystem
  1. Practical Applications (5 min)
  • When to choose TabPFN vs traditional methods
  • Resource requirements and scalability limits
  • What's next for TabPFN
  • Q&A

Key Takeaways:

  • Practical understanding of TabPFN's capabilities and limitations
  • Hands-on experience integrating with Python data science workflows
  • Best practices for working with foundation models on tabular data
  • Insight into emerging approaches for structured data analysis

Noah Hollmann

Frank Hutter

Frank is a Hector-Endowed Fellow and PI at the ELLIS Institute Tübingen and has been a full professor for Machine Learning at the University of Freiburg (Germany) since 2016. Previously, he has been an Emmy Noether Research Group Lead at the University of Freiburg since 2013. Before that, he did a PhD (2004-2009) and postdoc (2009-2013) at the University of British Columbia (UBC) in Canada. He received the 2010 CAIAC doctoral dissertation award for the best thesis in AI in Canada, as well as several best paper awards and prizes in international ML competitions. He is a Fellow of ELLIS and EurAI, Director of the ELLIS unit Freiburg, and the recipient of 3 ERC grants. Frank is best known for his research on automated machine learning (AutoML), including neural architecture search, efficient hyperparameter optimization, and meta-learning. He co-authored the first book on AutoML and the prominent AutoML tools Auto-WEKA, Auto-sklearn and Auto-PyTorch, won the first two AutoML challenges with his team, is co-teaching the first MOOC on AutoML, co-organized 15 AutoML-related workshops at ICML, NeurIPS and ICLR, and founded the AutoML conference as general chair in 2022. In recent years, his focus has been on the intersection of foundation models and AutoML, prominently including the first foundation model for tabular data, TabPFN.