This tutorial tackles a common pain point in data science – extracting useful features from relational data spread across multiple interconnected tables. Manually crafting these features is often tedious, error-prone, and heavily reliant on domain expertise.
Why is this important? Relational data powers industries from e-commerce and healthcare to finance. Yet, building predictive models on such datasets often involves laborious feature engineering. getML FastProp – the fastest open-source algorithm for automated feature engineering – streamlines this process, helping data scientists move faster and build better models.
In this hands-on tutorial, we’ll work through two tasks from Stanford’s Relational Learning Benchmark (RelBench) using the H&M Fashion dataset: 1) Predict customer churn with a classification model, 2) Forecast item sales using regression model.
We’ll walk through the code and concepts needed to solve these tasks with getML FastProp, achieving state-of-the-art performance and outperforming both Relational Deep Learning models and an experienced human data scientist.
By the end of this tutorial, you'll learn how to:
This tutorial provides a practical, reproducible framework for working with relational and time-series data, applicable across industries and domains.