The future of AI training is federated

Chong Shen Ng

Thursday 10:15 in Dynamicum

Federated Learning has quickly become the preferred form of training of AI models when the training data cannot leave their point of origin due to privacy regulations (e.g. GDPR), legal constraints (e.g. in different jurisdictions), and logistical challenges (e.g. large volumes of data, sparse connectivity), among other reasons. Furthermore, contracts and regulations establish boundaries for data sharing, particularly in industries like healthcare and finance, where misuse prevention is crucial. One could also argue that we are running out of publicly and ethically sourced datasets, for instance to scale large foundational models, and federated learning offers one way to train models on protected data.

The key point of this tutorial is to introduce an alternative approach to training AI models that is straightforward and accessible.

This tutorial is sequenced in 3 parts. We’ll first introduce federated learning and its prototypical architecture. In part 2, we’ll dive into a series of live Python code demos that showcase how to convert a classical centralized machine learning workflow into a federated workflow involving multiple federated clients. We’ll demonstrate the similarities and differences of how the iteration of a federated research project is conducted. Finally, in part 3, we’ll demonstrate how you can take your research code and deploy it in a production setting using a mixture of physical edge devices and VMs.

Throughout the tutorial, we’ll use Flower, the fully open-sourced federated AI framework, which is written in Python and designed for Python users. With simplicity as one of it’s main goals, Flower provides multiple features and libraries to accelerate research, such as Flower Baselines (for reproducing federated learning benchmarks) and Flower Datasets (a standalone Python library for easily creating federated datasets). We’ll showcase how to use the Flower CLI in both research and production setting.

This tutorial addresses people with fluency in Python, CLI, and basic knowledge of a machine learning project. It would help if you’ve also used Docker before. Any data practitioner is encouraged to attend the tutorial to learn and discuss how to federate and distribute the training of an ML model.

You will learn:

  • What’s Federated Learning?
    • Basics and real-world examples
  • How to federate your existing ML training code, and more FL-specific steps such as how to:
    • Configure the behaviours of each federated client
    • Persist the state of each client across global rounds
    • Evaluate both aggregated and local models
    • Standardize your FL experiments
    • Track your experiments
  • How to deploy your research code in a production setting, such as how to:
    • Deploy Flower federated learning clients using Docker
    • Set-up secure connection and node authentication
    • Run, monitory, and manage the federated learning runs.

Bring your own laptop if you’d like to follow along. Some code examples will be executed in Google Colab, others can be locally executed on your favourite IDE. A GitHub repo containing the code examples will be shared before the event.

The tutorial session is structured in the following way:

  • 0:00 Introduction, and getting to know the audience.
  • 0:05 What’s Federated Learning? Basics and real-world-examples.
  • 0:25 Overview of the Flower framework for federated learning
  • 0:30 Quickstart examples with PyTorch. Moving from a centralized training to federated.
  • 1:00 Deploying your research to production
  • 1:20 Feedback and Q&A

Chong Shen Ng

Dr. Chong Shen Ng is a Research Engineer at Flower Labs with over a decade of experience in both research and industry, specializing in federated learning, data science, and parallel computing. As a key developer, he focuses on scaling Flower to deploy privacy-enhanced distributed AI solutions for real-world applications. Chong Shen is passionate about contributing to the open-source community, developing trustworthy AI systems through federated learning, and advancing edge AI technologies. A dedicated advocate for open-source software, he has co-chaired PyData Global events and volunteered at SciPy and PyData London conferences.