Unlocking the Predictive Power of Relational Data with Automated Feature Engineering

Alexander Uhlig

Thursday 14:20 in Ferrum

This tutorial tackles a common pain point in data science – extracting useful features from relational data spread across multiple interconnected tables. Manually crafting these features is often tedious, error-prone, and heavily reliant on domain expertise.

Why is this important? Relational data powers industries from e-commerce and healthcare to finance. Yet, building predictive models on such datasets often involves laborious feature engineering. getML FastProp – the fastest open-source algorithm for automated feature engineering – streamlines this process, helping data scientists move faster and build better models.

In this hands-on tutorial, we’ll work through two tasks from Stanford’s Relational Learning Benchmark (RelBench) using the H&M Fashion dataset: 1) Predict customer churn with a classification model, 2) Forecast item sales using regression model.

We’ll walk through the code and concepts needed to solve these tasks with getML FastProp, achieving state-of-the-art performance and outperforming both Relational Deep Learning models and an experienced human data scientist.

By the end of this tutorial, you'll learn how to:

  • Understand relational learning – Grasp the core challenges and concepts of working with multi-table datasets.
  • Reproduce results – Run the provided notebooks and code to reproduce the results at your own pace.
  • Automate feature engineering – Use getML’s FastProp to extract features directly from relational data.
  • Build and optimize getML pipelines – Develop pipelines for both classification and regression tasks.
  • Integrate into MLOps workflows – Leverage getML alongside LightGBM and Optuna.

This tutorial provides a practical, reproducible framework for working with relational and time-series data, applicable across industries and domains.

Alexander Uhlig

Alexander Uhlig is the CEO of Code17, the company behind getML. With a background in Physics, he leads the development of getML and has worked hands-on with data teams to build prediction models across various domains, including healthcare, trading, and e-commerce.