How to use Data Science Superpowers in real life, a Bayesian perspective

Tim Lenzen

Wednesday 16:10 in Helium3

In this talk, I want to have a look on decision making from a slightly different angle. In a world that produces an ever growing amount of data in every domain, data scientists can shine with their tools to make data-driven decisions. Often there is even too much data and the most tedious part of the work is to remove the noise from the signal with clever feature engineering. Though the world gets covered more and more by big data, this development is not distributed evenly.

Lots of decisions we need to make in real life do not follow this pattern. In fact, there are often surprisingly few data points that help us here. Yet, are there fundamental differences between everyday decisions and the type of decisions we automate so well with machine learning in our jobs? In this part of the talk, I will attempt a characterisation of both types of decisions. We will have a closer look at what implicit assumptions we make to use our machine learning toolbox. After this we might get a first explanation why these tools might be unsuited to answer questions like ’how longe should I study for an exam’ or ‚’should I accept this new job or not’.

Enter Bayesian statistics: This part of the talk will introduce Bayesian statistics for beginners using simple examples and images. It will highlight the benefits of the method when we are short of data but have some additional experience not encoded in the data. I will show how in these circumstances prior distributions come in really handy.

After laying the groundwork on Bayesian methods we will circle back to the everyday decisions and see how well both things fit together. On a higher level, this will show what makes problems in decision making a great fit for Bayesian methods. I will introduce this using a practical example. The example will deal with the decision how long one should study for a test or exam. Taking a step-by-step approach, we explore how this decision can be informed with just a few data points. Set aside finding the key to successful exam preparation, the example is also helpful to see some of the basics for working with the pymc library.

The talk will end with some more general thoughts. This will answer where to go from here and for which decisions a thorough investigation like the presented one is worthwhile.,Yet, once one is familiar with the basics of Bayesian thinking, there might be shortcuts. I will show that we can use the principles as a great tool to improve discussions about important decisions on a broader scale.

Tim Lenzen

I am currently working as a Senior Data Scientist at Ailio. My focus is on helping improve organizations by better utilizing their data. I contribute to these transformation projects by bringing in my broad expertise in data related topics ranging from data engineering and cloud-development (AWS, Azure) over data science and machine learning to communication and leadership skills.

After completing my masters in chemistry, I really started my journey in the data science and machine learning field during my PhD studies in theoretical chemistry. The next step for me was a role as a data scientist in a company developing software in the IT-Security field. For five years, I worked on a system to detect suspicious e-mail traffic using machine learning. Set aside the technical aspect of the job, I also built a small team. From this experience I learnt a lot about leadership and developing software products on a larger scale.

I strongly believe that using the right data to inform important decisions helps organizations of all kinds improve. However, often this is easier said than done. I am always curios to discover and tackle these interesting challenges. Also, I am more than happy to sharing my knowledge and learnings.