BayBE: A Bayesian Back End for Experimental Planning in the Low-To-No-Data Regime

Martin Fitzner, Alexander Hopp, Adrian Šošić

Thursday 10:15 in Ferrum

In the evolving landscape of data science, advanced computational tools are crucial for driving innovation and efficiency. This tutorial introduces the Bayesian Back End (BayBE), an AI-assisted open-source experimental planner developed by Merck KGaA, which utilizes Bayesian Optimization and machine learning to smartly streamline experimental workflows in the low-to-no-date regime. From chemical reactions to biological assays to coffee machine settings - with BayBE users can find optimal configurations in an iterative manner, which is anyway the main working mode of many experimentalists.

We will start the first part with a brief introduction to Bayesian Optimization, highlighting its principles and advantages in experimental design. Following this, we will showcase BayBE's unique features, including elegant categorical encodings and advanced capabilities like active learning, transfer learning or Pareto optimization.

In the second part, we explain some of our code and test design choices that went into the open-source Python package baybe. This will include learnings about our built-in (de-)serialization engine, CI/CD, advanced hypothesis tests, autodocumentation and open-source tools BayBE is built on.

The final part will comprise of a hands-on tutorial. We will look at representative problems and guide potential users from formalization of the problem to performing the iterative loop to analyzing the results including an assessment of parameter relevance.

Martin Fitzner

Lead Data Scientist at Merck KGaA Darmstadt, Germany Interested in combining machine learning, data science, computational natural science, and chminformatics.

Alexander Hopp

Mathematician who got into coding and enjoys it way too much. One of the three core developers of BayBE, the Bayesian Optimization Package developed at Merck KGaA, Darmstadt. Also working on antibody and retrosynthesis projects.

Interested in everything the intersection between mathematics and computer science has to offer, as well as in best practices for coding. Always curious to learn!

Adrian Šošić

Lead Data Scientist at Merck Life Science KGaA, Darmstadt, Germany Machine Learning and Probabilistic Modeling