Federated Learning has quickly become the preferred form of training of AI models when the training data cannot leave their point of origin due to privacy regulations (e.g. GDPR), legal constraints (e.g. in different jurisdictions), and logistical challenges (e.g. large volumes of data, sparse connectivity), among other reasons. Furthermore, contracts and regulations establish boundaries for data sharing, particularly in industries like healthcare and finance, where misuse prevention is crucial. One could also argue that we are running out of publicly and ethically sourced datasets, for instance to scale large foundational models, and federated learning offers one way to train models on protected data.
The key point of this tutorial is to introduce an alternative approach to training AI models that is straightforward and accessible.
This tutorial is sequenced in 3 parts. We’ll first introduce federated learning and its prototypical architecture. In part 2, we’ll dive into a series of live Python code demos that showcase how to convert a classical centralized machine learning workflow into a federated workflow involving multiple federated clients. We’ll demonstrate the similarities and differences of how the iteration of a federated research project is conducted. Finally, in part 3, we’ll demonstrate how you can take your research code and deploy it in a production setting using a mixture of physical edge devices and VMs.
Throughout the tutorial, we’ll use Flower, the fully open-sourced federated AI framework, which is written in Python and designed for Python users. With simplicity as one of it’s main goals, Flower provides multiple features and libraries to accelerate research, such as Flower Baselines (for reproducing federated learning benchmarks) and Flower Datasets (a standalone Python library for easily creating federated datasets). We’ll showcase how to use the Flower CLI in both research and production setting.
This tutorial addresses people with fluency in Python, CLI, and basic knowledge of a machine learning project. It would help if you’ve also used Docker before. Any data practitioner is encouraged to attend the tutorial to learn and discuss how to federate and distribute the training of an ML model.
You will learn:
Bring your own laptop if you’d like to follow along. Some code examples will be executed in Google Colab, others can be locally executed on your favourite IDE. A GitHub repo containing the code examples will be shared before the event.
The tutorial session is structured in the following way: