Intuitive A/B Test Evaluations for Coders

Thomas Mayer

Friday 14:55 in Ferrum

Making A/B Test Evaluations Intuitive for Coders: A Python-Based Approach

A/B testing is an essential method for data-driven decision-making, but interpreting the results can be daunting. Complex jargon around p-values and confidence intervals often creates barriers to understanding. This talk simplifies A/B testing by introducing a practical, Python-powered approach using bootstrapping—a flexible and accessible method that aligns with how software engineers think and works without requiring statistical knowledge.

Session Highlights:

  1. Statistical Significance and Hypothesis Testing:
    • Why is statistical testing crucial for A/B tests? Simple comparisons overlook randomness.
    • Using Python, we’ll demonstrate how to simulate "what-if" scenarios by shuffling and resampling data, allowing participants to compute p-values and understand the likelihood of observed differences occurring by chance.
  2. Confidence Intervals with Bootstrapping:
    • Confidence intervals clarify the range of plausible outcomes.
    • We’ll explore how to resample experiment data repeatedly to estimate variability and construct intuitive confidence intervals—all using basic tools like random number generators and loops, without requiring advanced math.
    • Key Takeaways:
  • Hands-on skills to compute p-values and confidence intervals using basic programming concepts.
  • Clear, step-by-step demonstrations of shuffling, resampling, and generating statistical insights.
  • Practical knowledge to move beyond black-box libraries and understand the "why" and "how" behind A/B test evaluations.

By the end of the session, attendees will be equipped to demystify A/B testing with a coder-friendly workflow, empowering them to make confident, data-driven decisions in their projects.

Talk Outline:

  1. Setting the Stage (5 minutes)
    • What is A/B testing?
    • Why isn't it enough to just compare numbers? Why do we need statistics to interpret results?
  2. Statistical Significance and P-Values (5 minutes)
    • Statistical tests (t-test, z-test, binomial test) are frequently used, but what is the intuition behind them?
    • Introducing the basic idea of bootstrapping.
  3. Bootstrapping Explained (8 minutes)
    • Step-by-step illustration of the bootstrapping approach.
    • What is a p-value? An intuitive description using resampling.
  4. Confidence Intervals Explained (7 minutes)
    • Importance of confidence intervals and how they help interpret results.
    • Intuitive computation of confidence intervals using bootstrapping.
    • Impact of sample size on confidence intervals and certainty.
  5. Why These Statistics Matter (5 minutes)
    • Discussion on the practical necessity of statistical techniques.
    • How these methods ensure data-driven decision-making in A/B testing.

Thomas Mayer

Thomas Mayer holds a PhD in Quantitative Language Comparison and brings a profound background in Machine Learning and Natural Language Processing (NLP) to his work. As Team Lead in the Data Intelligence team at HolidayCheck, Thomas combines his passion for data-driven insights with his expertise in linguistics and AI to drive innovation in the travel industry. With a deep understanding of both technical and business challenges, he plays a pivotal role in leveraging data to enhance customer experiences and inform strategic decisions.