Traditional API architectures face significant challenges in environments where repetitive and frequent requests are required to retrieve data updates. These request-response mechanisms introduce latency, as clients must continually query the server to check for changes, often receiving redundant or outdated information. This approach leads to increased network overhead, inefficient use of server resources and diminished scalability as the number of clients or requests grows. Additionally, frequent requests expand the attack surface, requiring security measures to mitigate risks such as (un-)authorised access, rate limiting and query sanitisation. Managing all of these inherent problem results in increasingly complex systems to maintain and improve while putting considerable implementation effort onto the customer. Join to find out how transitioning to a streaming architecture can address these issues by providing proactive, event-based data delivery, reducing latency, minimising redundant processing, enhancing scalability and simplifying security management.
Learn how to build a minimalist game physics engine in Rust and make it accessible to Python developers using PyO3. This talk explores fundamental concepts like collision detection and motion dynamics while focusing on Python integration for scripting and testing. Ideal for developers interested in combining Rust’s performance with Python’s ease of use to create lightweight and efficient tools for games or simulations.
NLP and data science could be so easy if all of our data came as clean and plain text. But in practice, a lot of it is hidden away in PDFs, Word documents, scans and other formats that have been a nightmare to work with. In this talk, I'll present a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem. I'll show you how you can go from PDFs to structured data and even build fully custom information extraction pipelines for your specific use case.
Topic modelling has come a long way, evolving from traditional statistical methods to leveraging advanced embeddings and neural networks. Python’s diverse library ecosystem includes tools like Latent Dirichlet Allocation (LDA) using gensim, Top2Vec, BERTopic, and Contextualized Topic Models (CTM). This talk evaluates these popular approaches using a dataset of UK climate change policies, considering use cases relevant to organisations like DEFRA (Department for Environment, Food & Rural Affairs). The analysis explores real-time integration, dynamic topic modelling over time, adding new documents, and retrieving similar ones. Attendees will learn the strengths, limitations, and practical applications of each library to make informed decisions for their projects.
Django's model-view-serializer approach works great for small apps, but as projects grow, you might face challenges like scattered database operations and APIs that are too closely linked to your database models. These issues can make unit testing and scaling harder. I'll share real-world examples of these problems and show how to refactor an app using ideas from Domain-Driven Design (DDD) and hexagonal architecture. We'll look at a before-and-after example to see how these changes can make your app easier to use and debug. Plus, we'll discuss when Django's simplicity is enough and when it's worth adopting a more structured approach. You'll leave with practical tips for transforming your Django projects into systems that can handle increased complexity.
In today's data-driven landscape, understanding causal relationships is essential for effective marketing strategies. This talk will explore the link between Bayesian causal thinking and media mix modeling, utilizing Directed Acyclic Graphs (DAGs), Structural Causal Models (SCMs), and the Data Generation Process (DGP). We will examine how DAGs represent causal assumptions, how SCMs define relationships in media mix models, and how to implement these models within a Bayesian framework. By using media mix models as causal inference tools, we can estimate counterfactuals and causal effects, offering insights into the effectiveness of media investments.
Simplify deploying hybrid Django applications with synchronous views and asynchronous apps. This session covers ASGI support, Docker containerization, and Kamal for seamless, zero-downtime deployments on single-server setups, ideal for hobbyists and small-scale projects.
The pytest tool offers a rapid and simple way to write tests for your Python code. This training gives an introduction with exercises to some distinguishing features, such as its assertions, marks and fixtures. Despite its simplicity, pytest is incredibly flexible and configurable. We'll look at various configuration options as well as the plugin ecosystem around pytest.
LLMs, Machine learning and AI are everywhere, yet their security is often overlooked, leaving your systems vulnerable to serious attacks. What happens when someone tampers with your model’s input, poisons your training data, or steals your model? In this talk, I’ll explore these risks through the lens of the OWASP Machine Learning Security Top 10 using relatable, real-world examples from the climate tech world. I’ll explain how these attacks happen, their impact, and why they matter to you as a Python developer, data scientist, or data engineer. You’ll learn practical ways to defend your models and pipelines, ensuring they’re robust against adversarial forces. Bridging theory and practice, you'll leave equipped with insights and strategies to secure your machine learning systems, whether you’re training models or deploying them in production. By the end, you’ll have a solid understanding of the risks, a toolkit of best practices, and maybe even a new perspective on how important security is everywhere.
In the right corner, we have the go-to dashboarding solution for showcasing ML models or visualizing data, **STREAMLIT** (\*crowd cheers\*). Simple yet powerful, it defends the throne of Python dashboarding, but have you ever tried to create complex interactions with it? Things like drill-downs or logins, can make your control flow become messy really quick (\*crowd nods knowlingly\*). And in the left corner, the new contender in the arena of Python web frameworks which, according to its docs, "*excels at building dashboards*", **FastHTML** (\*crowd whoops\*). We will see if this is true, in the **ultimate dashboarding face off** (\*crowd gasps\*). By building the same dashboard, step by step, in both frameworks, investigate their strengths and weaknesses, we will see which framework can claim the crown.
Construction work in national railroad networks often disrupts train traffic, making it vital to estimate hourly train numbers for effective re-routing. Traditionally managed by humans, this process has been automated due to staff shortages and demographic changes. DB Systel GmbH, Deutsche Bahn's IT provider, leveraged machine learning and artificial intelligence to estimate train traffic during construction. Using Python and frameworks like Pandas, scikit-learn, NumPy, PyTorch and Polars, their solution demonstrated significant benefits in performance and efficiency.
Orchestration is a typical challenge in the data engineering world. Scheduling your data transformation jobs via CRON-jobs is cumbersome and error-prone. Furthermore, with an increasing number of jobs to manage it gets in-oversee able. Tools like Apache Airflow, Dagster, Luigi, and Prefect are known for addressing these challenges but often require additional resources or investment. With the advent of serverless orchestration tools, many of these disadvantages are mitigated, offering a more streamlined and cost-effective solution. This session provides a comprehensive overview of combining serverless architecture with orchestration. We will start by defining the core concepts of orchestration and serverless technologies and discuss the benefits of integrating them. The talk will then analyze solutions available in the cloud vendor space. Attendees will leave with a well-rounded understanding of the tools and strategies available in serverless orchestration.
Mentorship is a powerful way to shape careers while building meaningful connections in the data field. In this talk, I’ll share my journey as a professional mentor, what the role entails, and the impact it has on both mentees and mentors. Learn how mentorship drives growth, fosters innovation, and creates value for the data community—and why you should consider stepping into this rewarding role.
In natural disaster scenarios, reliable communication is crucial. This talk presents a solution for disaster relief coordination using OpenStreetMap vector maps hosted on a local device in the emergency vehicle with FastAPI, ensuring functionality without an internet connection. By integrating a database of post codes and street names, and leveraging a LORAWAN gateway to receive positional data and water levels, this system ensures access to critical information even in blackout situations.
Learn how to build a custom On-Air sign using Apache Kafka®, Apache Flink®, and Apache Iceberg™! See how to capture events like Zoom meetings and camera usage with Python, process data with FlinkSQL, analyze trends using Iceberg, and bring it all together with a practical IoT project that easily scales out.
"Why Exceptions Are Just Sophisticated Gotos - and How to Move Beyond" explores a common programming tool with a fresh perspective. While exceptions are a key feature in Python and other languages, they share surprising similarities with the notorious goto statement. This talk examines those parallels, the problems exceptions can create, and practical alternatives for better code. Attendees will gain a clear understanding of modern programming concepts and the evolution of programming.
Quantifying model uncertainties is critical to improve model reliability and make sound decisions. Conformal Prediction is a framework for uncertainty quantification that provides mathematical guarantees of true outcome coverage, allowing more informed decisions to be made by stakeholders
PDF, a must-have in RAG systems, ensures visual fidelity across platforms and devices, at the expense of compromising what would be the core condition for computers to properly process and interpret text: semantics. That means any logical arrangement of text, upon rendering, explodes into dummy visual shards of data that literally portrait the bigger picture for the human eye to perceive, but no longer convey the information computers should grasp. Such a bottleneck already makes proper ingestion of text-only documents a big challenge, let alone when tables or figures come into play, the ultimate nightmare for PDF parsers, not to say developers. The rest you must have already foreseen: a RAG system barfing unreliable knowledge from bad chunks (based on regular PDF parsing), if those ever get to be retrieved from a vector database. In this talk you can gather some vision-driven insights on how to leverage the strengths of PDF and language models towards good chunks to be ingested. Or, in other words, how multimodal models can go beyond trivial reverse engineering by decomposing tables into its building blocks, in plain language, as how those would be explained to another human; or better yet, as how humans would ask questions about such pieces of knowledge. And from such a strategy, we transfer the same rationale to figures. Come along, gather some insights, and get inspired to break down tables and figures from your own PDFs, and to improve retrieval in your RAG systems.
Join me as I share my 20-year journey with Python and its pivotal role at E.ON. Discover how we transitioned fully to Python, streamlined our development framework, and embraced MLOps principles. Learn about some of our AI projects, including image analysis and real-time inference, and our steps towards open-sourcing code to foster innovation in the energy sector. Explore why Python is our go-to language for data science and collaboration.
In this talk, we'll explore how you can implement your own distributed system for algorithmic trading leveraging the power of open source without being dependent on trading bot providers. We will discuss different challenges occurring in HFT inter alia processing massive amounts of data with low latency and reliable risk control and how to solve them. Furthermore we will touch on the topic of regulatory requirements in trading. These challenges will be addressed through a distributed system implemented in Python, utilizing Kafka for real-time data streaming, PostgreSQL for persistent storage and DuckDB for high-performance analysis. We will examine approaches to decouple the components to re-use and scale them across different markets. Cryptocurrency markets are used as a proving ground for the PoC due to easy availability for everyone.
"Don't Repeat Yourself" – a phrase that we have all heard many times. In this talk, we will have an overview how to deal with code duplication and how open-source template libraries such as Copier and Cookiecutter can assist us in managing similarly structured repositories. Furthermore, we will explore how code updates can be automated with the help of the open-source library Renovate Bot. By the end of this session, you will gain insights into these solutions while also questioning whether they truly eliminate repetition or merely contribute to another cycle of automation.
Scaling machine learning pipelines is no small feat - especially when you’re managing over 100 of them on AWS SageMaker. In this talk, I’ll take you behind the scenes of how our team at idealo built a Git-based MLOps framework that powers millions of real-time recommendations every minute. I’ll share the challenges we faced, the solutions we implemented, and the lessons we learned while streamlining model versioning, deployment, and monitoring. This session is packed with actionable takeaways for ML engineers, data scientists, and DevOps professionals looking to simplify their MLOps workflows and operate efficiently at scale. Whether you’re running a handful of pipelines or preparing to scale up, this talk will equip you with the tools and strategies to tackle MLOps with confidence.
Dreaming of creating sleek, interactive web apps with just Python? Streamlit is great for dashboards, but what if your needs go beyond that? Discover how Reflex.dev, a cutting-edge full-stack Python framework, lets you level up from dashboards to full-fledged web apps!
Generative AI models have shaken up the German market. Since the release of ChatGPT, AI is available and usable for everyone. The number of ChatGPT-based agents is growing rapidly, but concerns about privacy, copyright and ethics remain. Regulation and ethical AI go hand in hand, but are often seen as barriers. The presentation will cover the different aspects of ethics and how they are addressed by regulation. It will give an overview of how to use large language models in a safe and practical way. This won't only address the various ethical issues, but also convince your next customer to invest in your AI-based product.
The Python community is built on a culture of support, inclusion, and collaboration. Sustaining this welcoming environment requires intentional community-building efforts, which often involve repetitive or time-consuming tasks. These tasks, however, can be automated without compromising their value—freeing up time for meaningful human engagement. This talk showcases my project aimed at supporting underrepresented groups in tech, specifically through building Python communities on Mastodon and Bluesky. A key part of this initiative is the "Awesome PyLadies" repository, a curated collection of PyLadies blogs and YouTube channels that celebrates their work. To enhance visibility, I created a PyLadies bot for social media. This bot automates regular posts and reposts tagged content, significantly extending their reach and fostering an engaged community. In this session, I’ll cover: - The role of automation in community building - The technical architecture behind the bot - A hands-on demo on integrating Google’s Gemini into community tools - Upcoming features and opportunities for collaboration By combining Python, automation, and modern AI capabilities, we can create thriving, inclusive communities that scale impact while staying true to the human-centered ethos of open source.
The cloud native revolution has impacted all aspects of engineering, and data engineering is not exempt. One of the ongoing challenges in the data engineering world remains the local and distributed cloud native storage. In this talk we’ll explore working with distributed file systems in Python, through an intro to fsspec: a popular python library that is well-positioned to address the growing challenge of interacting with storage systems of different kinds in a consistent way. In this talk we’ll show hands-on examples of working with fsspec with some of the most popular data tools in the Python community: Pandas, Tensorflow and PyArrow. We’ll demonstrate a real world implementation of fsspec and how it provides easy extensibility through open source tooling. You’ll come away from this session with a better understanding for how to implement and extend fsspec to work with different cloud native storage systems.
Roundnet is a dynamic and fast-growing sport that combines quick reaction, athleticism, and strong community. However, like many emerging sports, it faces challenges in balancing competition, optimizing rules, and increasing accessibility for both players and spectators. This is where Python and data analysis come into play. In this talk, I'll share insights from my role as Data Lead on the International Roundnet rule committee, where we use Python-powered data analysis to make informed decisions about the future of the sport. We'll explore how analyzing gameplay patterns and testing rule changes with simulation can lead to fairer, more exciting games and attract a broader audience.
While text-based RAG systems have been everywhere in the last year and a half, the real power comes from combining multiple modalities In this session, we'll show how it is possible to create a Multimodal RAG system using Open-Source tools such as vLLM for deploying your LLM, Pixtral for multimodal understanding, and Milvus for storing your vectors. I'll walk you through the architecture and implementation details of setting up your own infrastructure, demonstrating how to effectively process and understand both images and text in a unified system. The talk will include a live demo, showing the real-time implementation and performance of the system. Whether you're looking to reduce dependency on commercial APIs or need more control over your LLM infrastructure, this talk will provide you with practical insights and implementation strategies.
In the data science field, we use all these powerful methods to solve important problems. Most of the time, we do this very well because our data science and machine-learning toolbox fits the problems we tackle quite precisely. Yet, what about our everyday choices or even our most important life decisions? Can we use for our private lives what we advocate for in our jobs or are these choices inherently different? Many of this real life decisions are a little different than textbook machine-learning problems. There is often less or hard-to-come-by data and the decisions are infrequent, but sometimes very consequential. This talk will dive into what makes everyday decisions difficult to handle with our data science toolbox. It will show how Bayesian thinking can help to reason in such cases, especially when there is not a lot of data to rely on.
While Polars is written in Rust and has the advantages of speed and multi-threaded functionalities., everything will slow down if a Python function needs to be applied to the DataFrame. To avoid that, a Polar extension can be used to solve the problem. In this workshop, we will look at how to do it.
Why do AI projects fail? Spoiler: It’s rarely about the technology. Organizations often stumble over unrealistic expectations, siloed data, and cultural roadblocks. This talk dives into the human and organizational dynamics that cause AI initiatives to derail. Using real-world examples, I’ll showcase how missing tram stop location data was found on a hobbyist’s blog when official systems failed. Or how critical historical data for resource planning sat locked in Excel files with forgotten passwords, forcing a desperate password recovery operation. These examples highlight the surprising ways poor data governance and lack of interdisciplinary collaboration sabotage even well-intentioned AI projects. But all is not lost! We’ll explore actionable strategies to navigate these challenges: shifting from shiny tools to R&D-driven approaches, fostering better communication between IT and business teams, and building strong foundations for clean, accessible data. Attendees will leave with insights into what AI success truly requires: cultural readiness, realistic expectations, and a commitment to collaboration. If you’re tired of AI hype and looking for practical solutions, this talk offers a clear roadmap to ensure AI projects deliver measurable impact.
Accessible documentation benefits everyone, from developers to end users. Using the [PyData Sphinx Theme](https://pydata-sphinx-theme.readthedocs.io/en/stable/) as a case study, this talk dives into common accessibility barriers in documentation websites like low contrast colors, missing focus states, etc. and practical ways to address them. Learn about accessibility improvements and take part in a live accessibility audit to see how small changes can make a big difference.
Astronomical surveys are growing rapidly in complexity and scale, necessitating accurate, efficient, and reproducible reduction and analysis pipelines. In this talk we explore Pythonic workflow managers to streamline processing large datasets on distributed computing environments. Modern astronomy generates vast datasets across the electromagnetic spectrum. NASA's flagship James Webb Space Telescope (JWST) provides unprecedented observations that enable deep studies of distant galaxies, cosmic structures, and other astrophysical phenomena. However, these datasets are complex and require intricate calibration and analysis pipelines to transform raw data into meaningful scientific insights. We will discuss the development and deployment of Pythonic tools, including snakemake and pixi, to construct modular, parallelized workflows for data reduction and analysis. Attendees will learn how these tools automate complex processing steps, optimize performance in distributed computing environments, and ensure reproducibility. Using real-world examples, we will illustrate how these workflows simplify the journey from raw data to actionable scientific insights.
Whenever you use a dot in Python you access an attribute. While this seems a very simple operation, behind the scenes many things can happen. This tutorial looks into this mechanism that is regulated by descriptors. You will learn how a descriptor works and what kind of problems it can help to solve. Python properties are based on descriptors and solve one type of problems. Descriptors are more general, allow more use cases, and are more re-usable. Descriptors are an advanced topic. But once mastered, they provide a powerful tool to hide potentially complex behavior behind a simple dot.
How do you build a large-scale data lakehouse architecture that makes data available for business analytics in real time, while being more cost-effective, more flexible and faster than the previous proprietary solution? With Python, Kafka and Iceberg, of course! We built a large-scale data lakehouse based on Apache Iceberg for the Schwarz Group, Europe's largest retailer. The system collects business data from thousands of stores, warehouses and offices across Europe. In this talk, we will present our architecture, the challenges we faced, and how Apache Iceberg is shaping up to be the data lakehouse format of the future.
In contemporary data-driven environments, the seamless integration of data into automated workflows is paramount. The reliability of automation, however, is constantly threatened by breaking changes in the source data. The Data-as-Code (DaC) paradigm address this challenge by treating data as a first-class citizen within the software development lifecycle.
This talk explores a contract-first approach to API development using the OpenAPI generator, a powerful tool for automating API generation from a standardized specification. We will cover (1) what would you need to run to have a standard implementation of the FastAPI endpoints and data models; (2) how to customize the mustache templates that are used to generate the API stubs; (3) share some ideas how to customize the CLI and (4) how to maintain the contract and how to handle breaking changes to the contract. We will close the session with a discussion of the challenges of implementing the OpenAPI generator.
This talk introduces a novel Python library for statistical testing using e-values, offering a refreshing alternative to traditional p-values. We'll explore how this approach enables real-time sequential testing, allowing data scientists to monitor experiments continuously without the statistical penalties of repeated testing. Through practical examples, we'll demonstrate how e-values provide more intuitive evidence measures and enable flexible stopping rules in A/B testing, clinical trials, and anomaly detection. The library implements cutting-edge methods from game-theoretic probability, making advanced sequential testing accessible to Python practitioners. Whether you're conducting A/B tests, monitoring production models, or running clinical trials, this talk will equip you with powerful new tools for sequential data analysis.
Collusion is a complex phenomenon in which companies secretly collaborate to engage in fraudulent practices. This talk presents an innovative methodology for detecting and predicting collusion patterns in different national markets using neural networks (NNs) and graph neural networks (GNNs). GNNs are particularly well suited to this task because they can exploit the inherent network structures present in collusion and many other economic problems. In Python, we use PyTorch and the Deep Graph Library (DGL) to develop and train models on individual market datasets from Japan, the United States, two regions in Switzerland, Italy, and Brazil, focusing on predicting collusion in single markets. In our empirical study, we show that GNNs outperform NNs in detecting complex collusive patterns. This research contributes to the ongoing discourse on preventing collusion and optimizing detection methodologies, providing valuable guidance on the use of NNs and GNNs in economic applications to enhance market fairness and economic welfare.
FastAPI has been constantly growing in popularity during the last years. A lot of this growth is driven by its relative simplicity and ease-of-use. In this talk, we'll discuss some practical insights into building a FastAPI application, based on my experience of migrating an existing Flask prototype to FastAPI. We'll explore how FastAPI's core features like Pydantic integration and dependency injection can improve API development, while also talking about the drawbacks of FastAPI.
Open table formats have revolutionized analytical, columnar storage on cloud object stores with critical features like ACID compliance and enhanced metadata management, once exclusive to proprietary cloud data warehouses. Delta Lake, Iceberg, and Hudi have significantly advanced over traditional open file formats like Parquet and ORC. In an effort to modernize our data architecture, we aimed to replace our Parquet-based bronze layer with Delta Lake, anticipating better query performance, reduced maintenance, native support for incremental processing, and more. While our initial pilot showed promise, we encountered unexpected pitfalls that ultimately brought us back to where we began. Curious? Join me as we shed light on the current state of table formats.
As a data scientist, I value the power of insightful visualizations to unlock unique interpretations of complex data. In my talk, I will introduce an elegant mathematical framework called Formal Concept Analysis (FCA), developed in the 1980s in Darmstadt. FCA transforms binary data into concepts that can be visualized as a hierarchical graph, offering a fresh perspective on multidimensional data analysis. Leveraging this theory and its open-source Python libraries, I am developing an interactive Dash-based tool featuring interactive tables and graphs to explore data insights. To illustrate its potential, I will showcase an optimization of the entire tariffing system of an energy provider company, highlighting how FCA can bring structure and clarity to even such tangled datasets.
In our quest to improve Python deployments, we explored Pixi, a tool designed to enhance dependency management within the Conda ecosystem. This talk recounts our experience integrating Pixi into a setup used in production. We leveraged Pixi to create lockfiles, ensuring consistent builds, and to automate deployments via CI/CD pipelines. This integration led to greater reliability and efficiency, minimizing deployment errors and allowing us to concentrate more on development. Join us as we share how Pixi transformed our deployment process and offer insights into optimizing your own workflows.
This talk introduces supplyseer, an open-source Python library that brings advanced analytics to Supply Chain and Logistics. By combining time series embedding techniques, stochastic process modeling, and geopolitical risk analysis, supplyseer helps organizations make data-driven decisions in an increasingly complex global supply chain landscape. The library implements novel approaches like Takens embedding for demand forecasting, Hawkes processes for modeling supply chain events, and Bayesian methods for inventory optimization. Through practical examples and real-world use cases, we'll explore how these mathematical concepts translate into actionable insights for supply chain practitioners.
Explore the inner workings of deep learning frameworks like TensorFlow and PyTorch by building your own in this workshop. We will start with the fundamental automatic differentiation mechanics and proceed to implementing more complex components like layers, modules and optimizers. This workshop is mainly designed for experienced data scientists, who want to expand their intuition about lower level framework internals.
Computers have long been an integral part of creating music. Virtual instruments and digital audio workstations make creating music easy and accessible. But how do programming languages and especially Python fit into this? Python can serve as a tool for creating musical notation and MIDI files. Throughout the session, you’ll learn how to: - Use Python to create melodies, harmonies, and rhythms. - Generate music based on rules, randomness, and mathematical principles. - Visualize and export your compositions as MIDI and sheet music. By the end of the talk, you’ll have a clear understanding of how to turn simple algorithms into expressive musical works.
duckdb is increasingly becoming a universal tool for accessing and analyzing data. In this talk I will show with slides and live demo what duckdb is capable of and will dive deeper in how it will influence modern data architectures.
The browser serves as our gateway to the internet—the largest repository of knowledge in human history. Proficiency in its use is a core skill across nearly all professions and is becoming increasingly important for Artificial Intelligence. But can Large Action Models (LAMs) autonomously operate a browser? What exactly are LAMs that promise to translate human intentions into actions? We report on a project that fully automates the job application process using AI: from navigating unfamiliar website structures and filling out forms to handling document uploads and cookie banners.
Bias in machine learning models remains a pressing issue, often disproportionately affecting the most vulnerable groups in society. This talk introduces a Bayesian perspective to effectively tackle these challenges, focusing on improving fairness by modeling and addressing bias directly. You will learn about the interplay between uncertainty, equity, and predictive accuracy, while gaining actionable insights to improve fairness in diverse applications. Using a practical example of a risk-scoring model trained on data with underrepresented minority groups, I will showcase how Bayesian methods compare to traditional techniques, demonstrating their unique potential to mitigate bias while maintaining performance.
Plastic pollution is a significant global challenge. Every year, millions of tons of plastic enter the oceans, impacting marine ecosystems and human health. To address this issue, the United Nations is negotiating a legally binding treaty with representatives from 180 countries, aiming to reduce plastic pollution and promote sustainable practices. We have developed NegotiateAI, an open-source chat application that supports delegations during the UN negotiations on a legally binding agreement to combat plastic pollution. The tool demonstrates how generative AI and Retrieval Augmented Systems (RAG) can address complex global challenges. Built with Haystack 2.0, Qdrant, HuggingFace Spaces, and Streamlit, it showcases the potential of open-source technologies in tackling issues of global relevance. As a beginner or advanced developer, this talk will give you valuable insights into developing impactful AI applications with open source tools in the public sector.
The Cyber Resilience Act (CRA) is focused on improving the security and resilience of digital products. But to comply with the CRA, businesses will need to start preparing the necessary evidence to ensure compliance if they want to continue to deliver digital products to the EU market once the CRA is in force. Key requirements within the CRA include implementing robust security measures throughout the product life-cycle, adopting secure development practices and implementing proactive vulnerability management processes. This session will show how a number of the requirements for the CRA can be achieved by use of a number of open source Python tools.
SQL remains a foundational component of data-driven applications, but ensuring the accuracy and reliability of SQL logic is often challenging. SQL testing can be cumbersome, time-consuming, and error-prone. However, these challenges can be addressed by leveraging the simplicity of Python's testing framework such as pytest, enabling clean, robust, and automated SQL testing.
Dive into the vulnerabilities of LLMs and learn how to prevent them From prompt injection to data poisoning, we’ll demonstrate real-world attack scenarios and reveal essential countermeasures to safeguard your applications.
If you were writing a data science tool in 2015, you'd have ensured it supported pandas and then called it a day. But it's not 2015 anymore, we've fast-forwarded to 2025. If you write a tool which only supports pandas, users will demand support for Polars, PyArrow, DuckDB, and so many other libraries that you'll feel like giving up. Learn about how Narwhals allows you to write dataframe-agnostic tools which can support all of the above, with zero dependencies, low overhead, static typing, and strong backwards-compatibility promises!
The rise of LLMs has elevated prompt engineering as a critical skill in the AI industry, but manual prompt tuning is often inefficient and model-specific. This talk explores various automatic prompt optimization approaches, ranging from simple ones like bootstrapped few-shot to more complex techniques such as MIPRO and TextGrad, and showcases their practical applications through frameworks like DSPy and AdalFlow. By exploring the benefits, challenges, and trade-offs of these approaches, the attendees will be able to answer the question: is prompt engineering dead, or has it just evolved?
Retrieval-Augmented Generation (RAG) chatbots are a key use case of GenAI in organizations, allowing users to conveniently access and query internal company data. A first RAG prototype can often be created in a matter of days. But why are the majority of prototypes still in the pilot stage? [\[1\]](https://www2.deloitte.com/content/dam/Deloitte/us/Documents/consulting/us-state-of-gen-ai-q3.pdf) In this talk we share our insights from developing a production-grade chatbot at Merck. Our RAG chatbot for R&D experts accesses over 50,000 documents across numerous SharePoint sites and other sources. We identified three key success factors: 1. Developing a robust data pipeline that syncs documents from source systems and that handles enterprise features such as replicating user permissions. 2. Establishing a comprehensive evaluation framework with a clear optimization metric. 3. Driving adoption through an onboarding training and ongoing user engagement, such as regular office hours. We think that many of these lessons are broadly applicable to RAG chatbots, making this talk valuable for practitioners aiming to implement GenAI solutions in business contexts.
Despite an array of regulations implemented by governments and social media platforms worldwide (i.e. famous DSA), the problem of digital abusive speech persists. At the same time, rapid advances in NLP and large language models (LLMs) are opening up new possibilities—and responsibilities—for using this technology to make a positive social impact. Can LLMs streamline content moderation efforts? Are they effective at spotting and countering hate speech, and can they help produce more proactive solutions like text detoxification and counter-speech generation? In this talk, we will dive into the cutting-edge research and best practices of automatic textual content moderation today. From clarifying core definitions to detailing actionable methods for leveraging multilingual NLP models, we will provide a practical roadmap for researchers, developers, and policymakers aiming to tackle the challenges of harmful online content. Join us to discover how modern NLP can foster safer, more inclusive digital communities.
How to integrate security into a software development project? Without jeopardizing timeline or budget? You decide! This interactive session covers crucial decisions for software security, and the audience decides how the story ends...
Many data and simulation scientists use NumPy for its ease of use and good performance on CPU. This approach works well for single-node tasks, but scaling to handle larger datasets or more resource-intensive computations introduces significant challenges. Not to mention, using GPUs requires another level of complexity. We present the cuPyNumeric library, which gives developers the same familiar NumPy interface, but seamlessly distributes work across CPUs and GPUs. In this talk we showcase the productivity and performance of cuPyNumeric library on one of the user's examples covering some detail on its implementation.
Automatic Differentiation (AD) is not only the backbone of modern deep learning but also a transformative tool across various domains such as control systems, materials science, weather prediction, 3D rendering, data-driven scientific discovery, and so on. Thanks to a mature ML framework ecosystem, powered by libraries like PyTorch and JAX, AD performs remarkably well at a component level; however, integrating these components into differentiable pipelines still remains a significant challenge. In this talk, we will provide an accessible introduction to (pipeline-level) AD, demonstrate some cool applications you can build with it, and see how to build differentiable pipelines that hold up in the real world.
In this talk, we show how to leverage publicly available tools to control any game or program using hand or body movements. To achieve this, we introduce PosePIE, an open-source programmable input emulator that generates input events on virtual gamepads, keyboards and mice based on gestures recognized by using AI-driven pose estimation. PosePIE is fully configurable by the user through Python scripts, making it easily adaptable to new applications.
From watching AI conquer Super Mario to building production-ready Reinforcement Learning (RL) systems, this talk explores how Python developers can dive into RL without requiring advanced degrees or big tech resources. Drawing on our three-year journey of building a production RL system, I’ll show how developers can leverage RL through practical strategies and accessible tools. Using pi_optimal, an open-source RL toolkit, we’ll bridge the gap between cutting-edge RL research and real-world applications. Attendees will gain actionable insights, implementation techniques, and hands-on experience to confidently start their own RL projects.
Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code. We show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline. By leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.
Using LLMs and AI in your Enterprise? Make sure you build Fine Grained Authorization to ensure your LLMs access only the data they are authorized to. This talk will show how you can build Relationship Based Access Control (ReBAC) for fine-grained authorization for your RAG pipelines. The talk also includes a demo using Pinecone, Langchain, OpenAI, and SpiceDB.
This talk explores the use of logits - the raw confidence scores that language models generate before selecting each token. Working directly with logits enables finer control over model behavior. The session covers practical techniques for accessing and utilizing these scores through local models. Topics include detecting model uncertainty, implementing custom stopping conditions, and steering generation without prompt modifications. You will learn how to analyze model confidence patterns and apply this knowledge to real-life use cases.
What if we could make the same revolutionary leap for tables that ChatGPT made for text? While foundation models have transformed how we work with text and images, tabular / structured data (spreadsheets and databases) - the backbone of economic and scientific analysis - has been left behind. TabPFN changes this. It's a foundation model that achieves in 2.8 seconds what traditional methods need 4 hours of hyperparameter tuning for - while delivering better results. On datasets up to 10,000 samples, it outperforms every existing Python library, from XGBoost to CatBoost to Autogluon. Beyond raw performance, TabPFN brings foundation model capabilities to tables: native handling of messy data without preprocessing, built-in uncertainty estimation, synthetic data generation, and transfer learning - all in a few lines of Python code. Whether you're building risk models, accelerating scientific research, or optimizing business decisions, TabPFN represents the next major transformation in how we analyze data. Join us to explore and learn how to leverage these new capabilities in your work.
While trees and human brains don't share that many properties regarding their domain, the analysis of the height of a tree and cancer in human brains does. This talk provides a not-so-serious introduction to the domain of computer vision for pathological use cases. Besides a general introduction to (digital) pathology and the technical similarities between satellite images (GeoTIFs) and pathological images (Whole-Slide Images), we will take a look at computer vision (both ML-based and conventional) on Python. Whether you have never done image processing in Python, are an expert (ready to share some tricks with me), or are just curious to see pictures of a human brain, this talk is for you. Warning: this talk contains quite abstract pink-ish pictures of human tissue (and trees^^). If you are unsure this is something you are comfortable with (have a friend), do a quick search for "HE-stained whole-slide image".
When it comes to open geographic data, OpenStreetMap is an awesome resource. Getting started and figuring out how to make the most out of the data available can be challenging. Using a personal example: frustration at the apparent lack of post offices in my neighborhood, we'll walk through examples of how to parse, filter, process, and visualize geospatial data with Python. At the end of this talk, you will know how to process geographic data from OpenStreetMap using Python and find out some surprising info that I learned while answering the question: Where have all the post offices gone?
Open source thrives not just on code, but on the diverse perspectives and inclusive practices that shape our communities. In this dynamic talk, we explore the intersection of open source and diversity, shedding light on how to bridge existing gaps and build welcoming environments for underrepresented groups. Through real-world examples, practical strategies, and inspiring stories from existing researches made on African communities, we'll uncover the immense benefits of diversity and equip you with actionable steps to foster inclusivity. Join us to learn how you can help create a more vibrant and inclusive open-source community that goes beyond code.
API versioning is tough, really tough. We tried multiple approaches to versioning in production and eventually ended up with a solution we love. During this talk you will look into the tradeoffs of the most popular ways to do API versioning, and I will recommend which ones are fit for which products and companies. I will also present my framework, Cadwyn, that allows you to support hundreds of API versions with ease -- based on FastAPI and inspired by Stripe's approach to API versioning. After this session, you will understand which approach to pick for your company to make your versioning cost effective and maintainable without investing too much into it.
Retrieval Augmented Generation (RAG) has become a cornerstone in enriching GenAI outputs with external data, yet traditional frameworks struggle with challenges like data noise, domain specialization, and scalability. In this talk, Tuhin will dive into open-source frameworks Fast GraphRAG and InstructLab, which addresses these limitations by combining knowledge graphs with the classical PageRank algorithm and Fine-tuning, delivering a precision-focused, scalable, and interpretable solution. By leveraging the structured context of knowledge graphs, Fast GraphRAG enhances data adaptability, handles dynamic datasets efficiently, and provides traceable, explainable outputs while InstructLab adds domain depth to the LLM through Fine-tuning. Designed for real-world applications, it bridges the gap between raw data and actionable insights, redefining intelligent retrieval for developers, researchers, and enterprises. This talk will showcase Fast GraphRAG’s transformative features coupled with domain specific Fine-tuning leveraging InstructLab and demonstrate its potential to elevate RAG’s capabilities in handling the evolving demands of large language models (LLMs) for developers, researchers, and businesses.
We all love Python - but we especially love it for its unique ability as a glue language. In this talk we will show a number of ways of extending Python: using Rust, C and Cython, C++, CUDA and Mojo! We will use the pixi package manager and the open source conda-forge distribution to demonstrate how to easily build custom Python extensions with these languages. The main challenge with custom extensions is about distributing them. The new pixi build feature makes it easy to build a Python extension into a conda package as well as wheel file for PyPI. Pixi will manage not only Python, but also the compilers and other system-level dependencies.
GetYourGuide, a global marketplace for travel experiences, reached diminishing returns with its XGBoost-based ranking system. We switched to a Deep Learning pipeline in just nine months, maintaining high throughput and low latency. We iterated on over 50 offline models and conducted more than 10 live A/B tests, ultimately deploying a PyTorch transformer that yielded significant gains. In this talk, we will share our phased approach—from a simple baseline to a high-impact launch—and discuss the key operational and modeling challenges we faced. Learn how to transition from tree-based methods to neural networks and unlock new possibilities for real-time ranking.
Bayesian methods are not commonly seen in Civil Engineering and Structural Dynamics. In this talk we explore how RxInfer.jl and the Julia Programming Language can simplify Bayesian modeling by implementing a Kalman filter for tracking the dynamics of a structural system. Perfect for engineers, researchers, and data scientists eager to apply probabilistic modelling and Bayesian methods to real-world engineering challenges.
A/B testing is a critical tool for making data-driven decisions, yet its statistical underpinnings—p-values, confidence intervals, and hypothesis testing—are often challenging for those without a background in statistics. Coders frequently encounter these concepts but lack a straightforward way to compute and interpret them using their existing skill set. This talk presents a practical approach to A/B test evaluations tailored for coders. By utilizing Python’s random number generator and basic loops, it introduces bootstrapping as an accessible method for calculating p-values and confidence intervals directly from data. The goal is to simplify statistical concepts and provide coders with an intuitive understanding of how to evaluate test results without relying on complex formulas or statistical jargon.
Ever craved your favorite dish, only to find its key ingredient missing from the store? You're not alone - stock outs can have significant consequences for businesses, resulting in frustrated customers and lost sales. On the other hand, overstocking can lead to wasted storage costs and potential write-offs. The replenishment system is responsible for striking the right balance between these opposing risks. The key to successful replenishment is making accurate predictions about future demand. This presentation takes a deep dive into the intricate world of demand forecasting, at Europe's largest retailer. We will demonstrate how enhancing simple machine learning methods with domain knowledge allows to generate hundreds of millions of high-quality forecasts every day.
License issues can haunt you at night. You spend days, weeks, and months developing beautiful software. But then it happens. You realize that an essential dependency is GPL-3.0 licensed. All your code is now infected with this license. Now you are forced to either: 1. Rewrite all parts relying on the other library 2. Open-source your codebase under the GPL-3.0 license How could this have been avoided? Join the talk and find out! First, we’ll give you a brief introduction to different software licenses and their implications. Second, we’ll show you how to automate your license checking using open-source software.
The entire industry approves of unit testing but almost no one can fully agree on how to do it correctly, or even on what unit tests are. This results in unit tests often being associated with slower development cycle and an overall less enjoyable workflow. I'll show you how testing turns into hell in real enterprises with the most common anti-patterns and then I'll show you that most of them are avoidable with modern tooling like mutation testing, snapshot testing, dirty-equals, and many more. We'll discuss how to make tests speed up your development and make refactoring easy.
Flame cutting is a method where metals are efficiently cut using precise control of the oxygen jet and consistent mixing of fuel gas. The condition of the nozzle is changing over time: deposits formed during the cutting process can degrade the flame quality, reducing the precision of the cut. Traditionally, nozzles suspected of wear are sent back for manual inspection, where experts evaluated the flame visually and audibly to determine whether repair or replacement is needed. This project leverages machine learning to optimize this process by analyzing acoustic emission data.
Recent time series foundation models such as LagLlama, Chronos, Moirai, TinyTimesMixer promise zero-shot forecasting for arbitrary time series. One central claim of foundation models is their ability to perform zero-shot forecasting, that is, to perform well with no training data. However, performance claims of foundation models are difficult to verify, as public benchmark datasets may have been a part of the training data, and only the already trained weights are available to the user. Therefore, performance in specific use cases must be verified on the use case data itself, to ensure a reliable assessment of forecasting performance. sktime allows users to easily produce a performance benchmark of any collection of forecasting models, foundation models, simple baselines, or custom methods, on their internal use case data.
This intermediate-level talk demonstrates how to leverage Language Model Query Language (LMQL) for structured generation and tool usage with open-source models like Llama. You will learn how to build a RAG system that enforces output constraints, handles tool calls, and maintains response structure - all while using open-source components. The presentation includes hands-on examples where audience members can experiment with LMQL prompts, showcasing real-world applications of constrained generation in production environments.
Image generation using AI has made huge progress over the last years, and many people still think that DALL-E with a text prompt is the best way to generate images. There are well-known models like Stable Diffusion and Flux, which can be used with easy-to-use frontends like A1111 or Invoke AI, but if you want to do more complex or bleeding-edge workflows, you need something else. In this talk, I want to show you ComfyUI, an open-source node-based GUI written in Python where you can build complex pipelines that are otherwise only possible using plain code.
The rapid evolution of AI technologies, particularly since the emergence of Large Language Models, has transformed the data science landscape from a field of steady progress to one of constant breakthroughs. This acceleration creates unique challenges for practitioners, from managing FOMO to battling imposter syndrome. Drawing from personal experience transitioning from mathematical modeling to modern AI development, this talk explores practical strategies for staying current while maintaining sanity. We'll discuss building effective learning structures, creating collaborative knowledge-sharing environments, and finding the right balance between innovation and implementation. Attendees will leave with actionable insights on navigating technological change while fostering sustainable growth in their teams and careers.
SAP's data often remains locked away, hindering the creation of a complete data picture. This talk presents a hands-on proof of concept leveraging SAP Datasphere, Python and PySpark to bridge an Azure-based, data mesh-inspired open data lake with a centralized SAP BI environment. This presentation will delve into the architecture of SAP Datasphere and its integration interfaces with Python. It will explore network integration, authentication, authorization and resource management options, as well as data integration patterns. The presentation will summarize the evaluated features and limitations discovered during the PoC.
Many managed MLOps platforms, while convenient, often fall short in providing flexibility, requiring complex integrations, and causing vendor lock-in. In this talk, we’ll share our experience transitioning from managed MLOps tools to a self-hosted solution built on Kubernetes. We’ll focus on how we leveraged open-source tools like Feast, MLflow, and Ray to build a more flexible, scalable, and customizable platform that is now in use at Rewe Digital. By migrating to this self-hosted architecture, we gained greater control over our ML pipelines, reduced our dependency on third-party services, and created a more adaptable infrastructure for our ML workloads.
The development of open source software is increasingly recognized as a critical contribution across many disciplines, yet the mechanisms for credit and citation vary significantly. This talk uses astronomy as a case study to explore shared challenges in attributing software contributions across research and industry. It will review the evolution of journal recommendations and policies over the past decade, alongside emerging publishing practices offering insights into their impact on the recognition of software contributions. An analysis of citation patterns for widely used libraries (numpy, scipy, astropy) highlights trends over time and their dependence on publication venues and policies. The talk will conclude with strategies for both developers and users for improving the recognition of software, fostering collaboration and sustainability in software ecosystems. All data and analysis code will be made available in a public repository, supporting transparency and further study.
At Userlike, Celery is the backbone of our application, orchestrating over a 100 million tasks per month. In this talk, I’ll share real-world insights into scaling Celery, optimizing performance, avoiding common pitfalls, handling failures, and building a resilient architecture.
Do you ever worry about your code becoming spaghetti-like and difficult to maintain? Master the art of crafting clean, maintainable, and adaptable software by harnessing the power of design patterns. This presentation will empower you with a clear, structured understanding of these reusable solutions to address common programming challenges. We'll delve into design patterns’ key categories: Behavioral, Structural, and Creational, as well as explore their functionality and how they can be applied in your daily development workflow. For each category, we'll also explore a practical design pattern in detail and showcase real-world applications of these patterns, along with small-scale code examples that illustrate their practical implementation. You'll gain valuable insight into how these patterns can translate into real-world development scenarios, such as facilitating communication between objects (Behavioral), separating interfaces from implementation for flexibility (Structural), and enabling dynamic algorithm selection at runtime (Creational).
Not all resources in today’s cloud environments have native Terraform providers. That’s where Python’s Typer library can step in, offering a flexible, production-ready command-line interface (CLI) framework to help fill in the gaps. In this session, we’ll explore how to integrate Typer with Terraform to manage resources that fall outside Terraform’s direct purview. We’ll share a real-life example of how Typer was used alongside Terraform to automate and streamline the management of an otherwise unsupported API. You’ll learn how Terraform can invoke Python scripts—passing arguments and parameters to control complex operations—while still benefiting from Terraform’s declarative model and lifecycle management. We’ll also discuss best practices for defining resource lifecycles to ensure easy maintainability and consistency across deployments. By the end, participants will see how combining Terraform’s robust infrastructure-as-code approach with Python’s versatility and Typer’s user-friendly CLI can create a powerful, cohesive strategy for managing even the trickiest resources in production environments.
A key feature of the Python data ecosystem is the reliance on simple but efficient primitives that follow well-defined interfaces to make tools work seamlessly together (Cf. http://data-apis.org/). NumPy provides an in-memory representation for tensors. Dask provides parallelisation of tensor access. Xarray provides metadata linking tensor dimensions. **Zarr** provides a missing feature, namely the scalable, persistent storage for annotated hierarchies of tensors. Defined through a community process, the Zarr specification enables the storage of large out-of-memory datasets locally and in the cloud. Implementations exist in C++, C, Java, Javascript, Julia, and Python, enabling. This talk presents a systematic approach to understanding and implementing the newer version of [Zarr-Python](https://github.com/zarr-developers/zarr-python), i.e. Zarr-Python 3 by explaining the new API, deprecations, new storage backend, improved codec pipeline, etc. I will also show the performance improvements in ZP-3 while creating, reading and writing async Zarr arrays across local and remote storage like AWS S3.
RAG has transformed AI systems by combining retrieval and generation, but traditional workflows often struggle with the dynamic demands of real-world applications, such as multi-step queries or integrating external APIs. Agentic behavior enhances RAG by enabling LLMs to make decisions, call tools, and adapt workflows dynamically. In this talk, we’ll define agentic behavior, explore its core features such as routing, tool integration, and reasoning, and demo its practical implementation in Python using Haystack.
According to the World Health Organization (WHO), an estimated 1.3 billion people (1 in 6 individuals) experience a disability, and nearly 2.2 billion people (1 in 5 individuals) have vision impairment. Improving the accessibility of visualizations will enable more people to participate in and engage with our data analyses. In this talk, we’ll discuss some principles and best practices for creating more accessible data visualizations. It will include tips for individuals who create visualizations, as well as guidelines for the developers of visualization software to help ensure your tools can help downstream designers and developers create more accessible visualizations.
Large Language Models (LLMs) are transforming digital products, but their non-deterministic behaviour challenges predictability and testing, making observability essential for quality and scalability. This talk presents **observability for LLM-based applications**, spotlighting three tools: Langfuse, OpenLIT, and Phoenix. We'll share best practices about what and how to monitor LLM features and explore each tool's strengths and limitations. Langfuse excels in tracing and quality monitoring but lacks OpenTelemetry support and customization. OpenLIT, while less mature, integrates well with existing observability stacks using **OpenTelemetry**. Phoenix stands out in debugging and experimentation but struggles with real-time tracing. The comparison will be enhanced by **live coding examples**. Attendees will walk away with an improved understanding of observability for **GenAI applications** and will understand which tool to use for their use case.
Have you ever asked yourself how parameters for an LLM are counted, or wondered why Gemma 2B is actually closer to a 3B model? You have no clue about what a KV-Cache is? (And, before you ask: no, it's not a Redis fork.) Do you want to find out how much GPU VRAM you need to run your model smoothly? If your answer to any of these questions was "yes", or you have another doubt about inference with LLMs - such as batching, or time-to-first-token - this talk is for you. Well, except for the Redis part.
Raw large language models struggle with complex reasoning. New techniques have emerged that allow these models to spend more time thinking before giving an answer. Direct token sampling can be seen as system-1 thinking and explicit step-by-step reasoning as system-2. How can this reasoning ability be improved and what is the future?
This talk will explore how Python can be leveraged to build robust DDoS defense mechanisms, focusing on real-time threat detection, mitigation strategies, and system resilience. We will dive into key Python libraries, best practices, and techniques to protect your applications from large-scale DDoS attacks while ensuring high availability.
In this tutorial, you will speed-date with board and card games that can be used to teach Data Science. You will play one game for 15 minutes, reflect on the Data Science concepts it involves, and then rotate to the next table. As a result, you will experience multiple ideas that you can use to make complex ideas more understandable and enjoyable. We would like to demonstrate how gamification can not only used to produce short puzzles and quizzes, but also as a tool to reason complex problem-solving strategies. We will bring a set of carefully selected games that have been proven effective in teaching statistics, programming, machine learning and other Data Science skills. We also believe that it is probably fun to participate in this tutorial.
Every Python developer faces performance challenges, from slow data processing to memory-intensive operations. While external libraries like Numba or Cython offer solutions, understanding core Python optimization techniques is crucial for writing efficient code. This talk explores practical optimization strategies using Python's built-in capabilities, demonstrating how to achieve significant performance improvements without external dependencies. Through real-world examples from machine learning pipelines and data processing applications, we'll examine common bottlenecks and their solutions. Whether you're building data pipelines, web applications, or ML systems, these techniques will help you write faster, more efficient Python code.
In this session, I’ll share our journey of migrating key parts of a Python application to Rust, resulting in over 200% performance improvement. Rather than focusing on quick Rust-to-Python integration with PyO3, this talk dives into the complexities of implementing such a migration in an enterprise environment, where reliability, scalability, and observability are crucial. You’ll learn from our mistakes, how we identified suitable areas for Rust integration, and how we extended our observability tools to cover Rust components. This session offers practical insights for improving performance and reliability in Python applications using Rust.
Is implementing authorization on your API endpoints an afterthought? Who should have access to your API endpoints? Is it secure? This talk covers using OAuth 2.0 to secure API endpoints built on FastAPI following industry-recognized best practices. Come on a journey with me from taking your API endpoints to being functional AND secure. When you follow secure identity standards, you’ll be equipped with a deeper understanding of the critical need for authorization.
Two truths and a lie: You have been hacked and you don’t know yet You haven’t been hacked because it’s not yet convenient (for the attackers) You think your application is secure Security is hard. Hard to measure and is always a catch-up game. Join this talk if you are interested in understanding a bit more about modern security and how it goes way beyond the code of your app.
Let’s explore the visual grammars, references and cultural norms at play in the field of AI; from Kismet to Spot®, from Clippy to Claude. As a sector we can be hyper-focused on technical process and function, to the extent that it blinkers our understanding of the cultural and political impacts of our work. Aesthetics infuse every aspect of technology. Aesthetic interpretations are manifold and mutable, constructed in-congress with the observer and not fully defined by the original designer. AI technologies add additional layers of subtext: character, consciousness, agency, intent. Despite this murkiness, or perhaps because of it, this talk makes an passionate argument for engaging with historical aesthetic movements, for building our shared professional knowledge of fads and fashions⎯not just from the past 40 years of internet culture⎯but also the past 140 years of ideology, technology, and thought.
Since it’s introduction in 2016, Federated Learning (FL) has become a key paradigm to AI models in scenarios when training data cannot leave its source. This applies in many industrial settings where centralizing data is challenging due to a combination of reasons, including but not limited to privacy, legal, and logistics. The main focus of this tutorial is to introduce an alternative approach to training AI models that is straightforward and accessible. We’ll walk you through the basics of an FL system, how to iterate on your workflow and code in a research setting, and finally deploy your code to a production environment. You will learn all of these approaches using a real-world application based on open-sourced datasets, and the open-source federated AI framework, [Flower](https://github.com/adap/flower), which is written in Python and designed for Python users. Throughout the tutorial, you’ll have access to hands-on open-sourced code examples to follow along.
Do you need to compare sets of points in a plane? Identify a potential cyclic event in high-dimensional time series data? Find the second or the third highest peak of a noisily sampled function? Topological data analysis (TDA) is not a universal hammer, but it might just be the 16 mm wrench for your 16 mm hex head bolt. There is no shortage of Python libraries implementing TDA methods for various settings, but navigating the options can be challanging without prior familiarity with the topic. In my talk I will demonstrate the utility of the tool with several simple examples, list various libraries used by the TDA community, and dive a bit deeper into the methods to explain what the libraries implement and how to interpret and work with the outputs.
Vector databases are everywhere, powering LLMs. But indexing embeddings, especially multivector embeddings like ColPali and Colbert, at a bulk is memory intensive. Vector streaming solves this problem by parallelizing the tasks of parsing, chunking, and embedding generation and indexing it continuously chunk by chunk instead of bulk. This not only increase the speed but also makes the whole task more optimized and memory efficient. The library gives many vector database supports, like Pinecone, Weavaite, and Elastic.
As we develop business critical software, we often need to rely on external APIs to get the job done. And all services are not born equal: although the ideal world would provide well operated APIs with over-met service levels, the real world is usually way worse than that. Timeouts, HTTP errors, cascading failures, unclear or changing contracts, approximate protocol implementations ... And even the oh-so-human bad faith while trying to pinpoint the root cause... Most of us have written hacks to handle commonly seen failures, from the quick and dirty implementation to well thought resilience patterns implementation, but this is usually hard to do correctly, and rarely a business priority to invest the correct amount of time and money on the topic. We'll present the options, both including direct dependencies (not framework dependant, although some families can emerge (async/sync ...)) and including a service/proxy based approach.
The PyData ecosystem is home to some of the best and most popular tools for doing data-science. Every data-scientist alive today has used pandas and scikit-learn and even Large Language Models know how to use them! For many years there have also been alternative implementations with similar interfaces and libraries with completely new approaches that focus on achieving the ultimate in performance and hardware acceleration. This talk will look at the recent efforts to give users the best of both worlds: a familiar and widely used interface as well as high performance.
This hands-on tutorial will dive into the fundamentals of building multi-agent systems using the CrewAI Python library. Starting from the basics, we’ll cover key concepts, explore advanced features, and guide you step-by-step through building a complete application from scratch. We’ll discuss implementing guardrails, securing interactions, and preventing query injection vulnerabilities along the way.
When I was hired as a Scientist for Machine Learning, experts said ML would never work in weather forecasting. Nowadays, I get to contribute to Anemoi, a full-featured ML weather forecasting framework used by international weather agencies to research, build, and scale AI weather forecasting models. The project started out as a curiosity by my colleagues and soon scaled as a result of its initial success. As machine learning stories go, this is a story of change, adaptation and making things work. In this talk, I'll share some practical lessons: how we evolved from a mono-package with four people working on it to multiple open-source packages with 40+ internal and external collaborators. Specifically, how we managed the explosion of over 300 config options without losing all of our sanity, building a separation of packages that works for both researchers and operations teams, as well as CI/CD and testing that constrains how many bugs we can introduce in a given day. You'll learn concrete patterns for growing Python packaging for ML systems, and balancing research flexibility with production stability. As a bonus, I'll sprinkle in anecdotes where LLMs like chatGPT and Copilot massively failed at facilitating this evolution. Join me for a deep dive into the real challenges of scaling ML systems - where the weather may be hard to predict, but our code doesn't have to be.
Unlock the full potential of modern web scraping by combining Python, Scrapy, and Playwright to extract data from dynamic, JavaScript-heavy sites—exemplified by LEGO product pages. This talk introduces Model Context Protocol (MCP) servers for orchestrating advanced data fetching, refining CSS selectors, and integrating Large Language Models for automated code suggestions. Learn how to scale ethically, handle concurrency, and respect site policies, while maintaining flexible, maintainable pipelines for diverse use cases from research to robotics.
ORMs like Django and SQLAlchemy have become indispensable in Python development, simplifying the interaction between applications and databases. Yet, their built-in schema migration tools often fall short in projects that require advanced database features or robust CI/CD integration. In this talk, we’ll explore how you can go beyond the limitations of your ORM’s migration tool. Using Atlas—a language-agnostic schema management tool—as a case study, we’ll demonstrate how Python developers can automate migration planning, leverage advanced database features, and seamlessly integrate database changes into modern CI/CD pipelines.
The Transformer architecture, originally designed for machine translation, has revolutionized deep learning with applications in natural language processing, computer vision, and time series forecasting. Recently, its capabilities have extended to sequence-to-sequence tasks involving log data, such as telemetric event data from computer games. This talk demonstrates how to apply a Transformer-based model to game log data, showcasing its potential for sequence prediction and representation learning. Attendees will gain insights into implementing a simple Transformer in Python, optimizing it through hyperparameter tuning, architectural adjustments, and defining an appropriate vocabulary for game logs. Real-world applications, including clustering and user level predictions, will be explored using a dataset of over 175 million events from an MMORPG. The talk will conclude with a discussion of the model's performance, computational requirements, and future opportunities for this approach.
Demos and prototypes for generative AI (GenAI) projects can be quickly created with tools like Streamlit, offering impressive results for users within hours. However, scaling these solutions from prototypes to robust systems introduces significant challenges. As user demand grows, hacks and workarounds in tools like Streamlit lead to unreliability and debugging frustrations. This talk explores the journey of overcoming these obstacles, evolving to a stable tech stack with Qdrant, Postgres, Litellm, FastAPI, and Streamlit. Aimed at beginners in GenAI, it highlights key lessons.
Snapshot tests are invaluable when you are working with large, complex, or frequently changing expected values in your tests. Introducing inline-snapshot, a Python library designed for snapshot testing that integrates seamlessly with pytest, allowing you to embed snapshot values directly within your source code. This approach not only simplifies test management but also boosts productivity by improving the maintenance of the tests. It is particularly useful for integration testing and can be used to write your own abstractions to test complex Apis.
Retrieval Augmented Generation (RAG) is a powerful technique for searching across unstructured documents, but it often falls short when the task demands an understanding of intricate relationships between entities. GraphRAG addresses this by leveraging knowledge graphs to capture these relationships, but it struggles with scalability and handling diverse unstructured formats. In this talk, we’ll explore how HybridRAG combines the strengths of both approaches - RAG for scalable unstructured data retrieval and GraphRAG for semantic richness- to deliver accurate and contextually relevant answers. We’ll dive into its application, challenges, and the significant improvements it offers for question-answering systems across various domains.
In general elections, voters often face the challenge of navigating complex political landscapes and extensive party manifestos. To address this, we developed Electify, an interactive application that utilizes Retrieval-Augmented Generation (RAG) to provide concise summaries of political party positions based on individual user queries. During its first roll-out for the European Election 2024, Electify attracted more than 6,000 active users. This talk will explore its development and deployment. It will focus on its technical architecture, the integration of data from party manifestos and parliamentary speeches, and the challenges of ensuring political neutrality and providing accurate replies. Additionally, we will discuss user feedback and ethical considerations, focusing on how generative AI can enhance voter information systems.
Is your Django application still relying on SQL LIKE queries for search? In this talk, we'll explore why basic text matching falls short of modern user expectations and how to implement proper search functionality without complexity. We'll introduce django-semantic-search, a practical package that bridges the gap between Django's ORM and powerful semantic search capabilities. Through practical code examples and real-world use cases, you'll learn how to enhance your application's search experience from basic keyword matching to understanding user intent. Whether you're building a content platform, e-commerce site, or internal tool, you'll walk away with concrete steps to implement production-ready search that your users will actually enjoy using.
Ever wondered how to use GenAI to enable self-service analytics through prompting? In this talk, I will share my experience of building a multi-tenant conversational analytics set-up that is built into a Software-as-a-Service (SaaS) platform. This talk is intended for AI engineers, data scientists, software engineers and anyone interested in using GenAI to power conversational analytics using open-source tools. I will discuss the challenges faced in designing and implementing, as well as the lessons learned along the way. We'll answer questions such as, why offer analytics through prompting? Why multi-tenancy and makes it so difficult? How to build it into an existing product? What makes open-source the preferred choice over proprietary solutions? What could the implications be for the analytics field?
Modern open source Python data packages offer the opportunity to build and deploy pure Python, production-ready data platforms. Engineers can and do play a big role in helping companies become data-driven by centralising this data, cleaning and modelling it and presenting back to the business. Now more than ever it allows engineers and companies of any size the ability to build data products and insights for relatively low cost. In this talk we’ll walk through the key components of this stack, tooling options available and demo a deployable containerised Python data stack.
Forecasting can often feel like interpreting vague signals—unclear yet full of potential. In this talk, we’ll cover advanced techniques for tuning forecasting models in professional settings, moving beyond the basics to explore methods that enhance both accuracy and interpretability. You’ll learn: How to set clear business goals for ML model tuning and align technical work with business needs, including balancing forecast granularity and accuracy and selecting statistically correct metric. Practical data preparation methods, including business-driven data cleaning and detecting data problems with statistical and buiness driven approaches. Advanced feature selection techniques such as recursive feature elimination and SHAP values, alongside hyperparameter tuning strategies including Bayesian optimization and ensemble methods. How generative AI can support model tuning by automating feature generation, hyperparameter search, and enhancing model explainability through SHAP and LIME techniques. Real-world case studies, including how Blue Yonder’s data science team optimized demand forecasting models for retail and supply chain applications. We'll also discuss common mistakes like overfitting and data leakage, best practices for reliable validation, and the importance of domain knowledge in successful forecasting. Whether you're a seasoned data scientist or exploring time series forecasting, you'll gain advanced insights and techniques you can apply immediately.
Building Streamlit apps is easy for Data Scientists - but when it’s time to deploy them to the cloud, challenges like slow model loading, scalability, and security can become major hurdles. This talk bridges two perspectives: the Data Scientist who builds the app and the MLOps engineer who deploys it. We'll dive into optimizing model loading from Hugging Face Hub, implementing features like autoscaling and authentication, and securing your app against potential threats. By the end of this talk, you’ll be ready to design Streamlit apps that are functional and deployment-ready for the cloud.
Building a good scoring model is just the beginning. In the age of critical AI applications, understanding and quantifying uncertainty is as crucial as achieving high accuracy. This talk highlights conformal prediction as the definitive approach to both uncertainty quantification and probability calibration, two extremely important topics in Deep Learning and Machine Learning. We’ll explore its theoretical underpinnings, practical implementations using TorchCP, and transformative impact on safety-critical fields like healthcare, robotics, and NLP. Whether you're building predictive systems or deploying AI in high-stakes environments, this session will provide actionable insights to level up your modelling skills for robust decision-making.
As publishers increasingly adopt AI agents for content generation and analysis, ensuring output quality and reliability becomes critical. This talk introduces a novel quality assurance framework built with DSPy that addresses the unique challenges of evaluating AI agents in publishing workflows. Using real-world examples from newsroom implementations, I will demonstrate how to design and implement systematic testing pipelines that verify factual accuracy, content consistency, and compliance with editorial standards. Attendees will learn practical techniques for building reliable agent evaluation systems that go beyond simple metrics to ensure AI-generated content meets professional publishing standards.
Generalized linear models (GLMs) are interpretable, relatively quick to train, and specifying them helps the modeler understand the main effects in the data. This makes them a popular choice today to complement other machine-learning approaches. `glum` was conceived with the aim of offering the community an efficient, feature-rich, and Python-first GLM library with a scikit-learn-style API. More recently, we are striving to keep up with PyData community's ongoing push for dataframe-agnosticism. While `glum` was originally heavily based on `pandas`, with the help of `narwhals`, we are close to being able to fit models on any dataset that the latter supports. This talk presents our experiences with achieving this goal.
Reinforcement Learning and related algorithms, such as Deep Q-Learning (DQL), have led to major breakthroughs in different fields. DQL, for example, is at the core of the AIs developed by DeepMind that achieved superhuman levels in such complex games as Chess, Shogi, and Go ("AlphaGo", "AlphaZero"). Reinforcement Learning can also be beneficially applied to typical problems in finance, such as algorithmic trading, dynamic hedging of options, or dynamic asset allocation. The workshop addresses the problem of limited data availability in finance and solutions to it, such as synthetic data generation through GANs. It also shows how to apply the DQL algorithm to typical financial problems. The workshop is based on my new O'Reilly book "Reinforcement Learning for Finance -- A Python-based Introduction".
Linear Regression is the workhorse of statistics and data science. Some data scientists even go as far and argue that "linear regression is all you need". In this talk, we will introduce three ways to run regression models faster by using smarter algorithms, implemented in the scikit-learn & fastreg (sparse solvers), pyfixest (Frisch-Waugh-Lovell), and duckreg (regression compression via duckdb) libraries.
Create, reflect, and earn—with purpose. In this workshop, you’ll not only build your own AI agent but also confront the ethical questions it raises, from its impact on jobs to its potential for social good. Together, we’ll explore how to harness AI for empowerment while uncovering pathways to turn your skills into meaningful value. This workshop is designed to equip Python enthusiasts with the tools to create their own AI agent while fostering a deeper understanding of the societal implications of this technology. Through hands-on learning, collaborative discussions, and practical monetization strategies, you’ll leave with more than just code—you’ll gain a vision of how AI can be wielded responsibly and profitably.
The invention of the clock and the organization of time in zones have helped synchronize human activities across the globe. While timekeepers are better at planning and sticking to the plan, time optimists somehow believe that time is malleable and extends the closer the deadline. Nevertheless, whether you are an organized timekeeper or a creative timebender, external factors can affect your commute. In this talk, we will define the different components necessary to build a personalized commute virtual agent in Python. The agent will help you analyze your historical lateness records, estimate future delays, and suggest the best time to leave home based on these predictions. It will be powered by a LLM and will use a technique called Function Calling to recognize the user intent from the conversation history and provide informed answers.
Docker is a game-changer for deploying AI applications, offering portable and consistent environments across platforms. In this tutorial, we’ll explore how to build and deploy a Retrieval-Augmented Generation (RAG) application using Docker. RAG applications combine data retrieval with generative AI to produce contextually relevant and accurate responses, making them powerful tools for interactive Q&A systems. We will build a document-based Q&A application that allows users to upload files, retrieve context from them, and answer questions interactively. We will use LlamaIndex to build the RAG pipeline and use both open-source LLM from Groq and closed-source LLM, such as OpenAI models, for LLM access. This tutorial will guide you through the entire lifecycle—from building the app to deploying it using Docker on Hugging Face Spaces. Whether you’re new to Docker or experienced with RAG, this guide offers a hands-on approach to deploying scalable and efficient AI solutions.
Machine Learning can transform transportation—improving safety, optimizing routes, and reducing delays—yet it also presents ethical concerns. From potential algorithmic bias to opaque decision-making processes, trust and fairness are at stake. In this talk,I will show how Explainable AI (XAI) can offer practical solutions to these ethical dilemmas. Instead of focusing on the technical underpinnings, we will discuss how transparency, accountability, and fairness can be enhanced in AI-driven transportation systems. Using a real-world example, I will demonstrate how XAI provides the groundwork for building ethical, trustworthy, and socially responsible AI solutions in public transportation systems.
Time series forecasting in the retail industry is uniquely challenging: Datasets often include stockouts that censor actual demand, promotional events cause irregular demand spikes, new product launches face cold-start issues, and diverse demand patterns within an imbalanced product portfolio create modeling challenges. In this talk, we’ll explore proven, real-world strategies and examples to address these problems. Learn how to successfully handle censored demand caused by stockouts, effectively incorporate promotional effects, and tackle the variability of diverse products using clustering and ensembling strategies. Whether you’re a seasoned data scientist or a Python developer exploring forecasting, the goal of this session is to introduce you to the key challenges in retail forecasting and equip you with actionable insights to successfully overcome them in real-life scenarios.
Search is everywhere, yet effective Information Retrieval remains one of the most underestimated challenges in modern technology. While Retrieval-Augmented Generation has captured significant attention, the foundational element - Information Retrieval - often remains underexplored. In this talk, we put Information Retrieval center stage by asking: How do we know that user queries and data 'speak' the same language? How do we evaluate the relevance and completeness of search results? And how do we prioritize what gets displayed? Or do we even want to hide specific content? We try to answer these questions by introducing the audience to the art and science of Information Retrieval, exploring metrics such as precision, recall, and desirability. We’ll examine key challenges, including ambiguity, query relaxation, and the interplay between sparse and dense search techniques. Through a live demo using public content from Sendung mit der Maus, we show how hybrid search improves upon vector and keyword based search in isolation.
Observability is challenging and often requires vendor-specific instrumentation. Enter OpenTelemetry: a vendor-agnostic standard for logs, metrics, and traces. Learn how to instrument Python applications with OpenTelemetry and send telemetry to your preferred observability backends.
Many LLM benchmarks focus on reasoning and coding tasks. These are exciting tasks! But the majority of LLM usage is still in writing and editing related tasks, and there's a surprising lack of benchmarks on these. In this talk you'll learn what it took to create a writing benchmark, and which model performs best!
Getting your model into production isn’t a trivial task, but it’s only half the battle. Ensuring that your model continues to deliver great performance over time is even more critical. In this talk I would like to present a selection of what can kill your model’s performance, zoom in on multivariate data drift, and present two methods to detect this type of drift in your production data.
Building performant Python often means reaching for C extensions. This talk explores an alternative: leveraging Rust to create blazing-fast Python modules that also benefit the Rust ecosystem. I will share practical strategies from building `semantic-text-splitter`, a library for fast and accurate text segmentation used in both Python and Rust, demonstrating how to bridge the gap between these two languages and unlock new possibilities for performance and cross-language collaboration.
The term "Responsible AI" has seen a threefold increase in search interest compared to 2020 across the globe. As developers, the questions like "How can we build large language model-enabled applications that are responsible and accountable to its users?" encountered in the conversation more often than before. And the discussion is further compounded by concerns surrounding uncertainty, bias, explainability, and other ethical considerations. In this session, the speaker will guide you through fmeval, an open-source library designed to evaluate Large Language Models (LLMs) across a range of tasks. The library provides notebooks that you can integrate into your daily development process, enabling you to identify, measure, and mitigate potential responsible AI issues throughout your system development lifecycle.
Generative AI development introduces unique security challenges that traditional methods often overlook. This talk explores practical threat modeling techniques tailored for AI practitioners, focusing on real-world scenarios encountered in daily development. Through relatable examples and demonstrations, attendees will learn to identify and mitigate common vulnerabilities in AI systems. The session covers user-friendly security tools and best practices specifically designed for AI development. By the end, participants will have practical strategies to enhance the security of their AI applications, regardless of their prior security expertise.
Inspecting Docker images is crucial for building secure and efficient containers. In this session, we will analyze the structure of a Python-based Docker image using various tools, focusing on best practices for minimizing image size and reducing layers with multi-stage builds. We’ll also address common security pitfalls, including proper handling of build and runtime secrets. While this talk offers valuable insights for anyone working with Docker, it is especially beneficial for Python developers seeking to master clean and secure containerization techniques.
The geometries in GeoPandas, using the Shapely library, are assumed to be in projected coordinates on a flat plane. While this approximation is often just fine, for global data this runs into its limitations. This presentation introduces spherely, a Python library for working with vector geometries on the sphere, and its integration into GeoPandas.
Can deep learning/AI models forget? In this talk, you'll explore the realm of machine unlearning, where researchers and practitioners aim to remove memorized examples from machine learning models. This is relevant for training increasingly overparameterized models and growing GDPR/Privacy concerns with large scale model development and use.
Relational data can be a goldmine for classical Machine Learning applications — yet extracting useful features from multiple tables, time windows, and primary-foreign key relationships is notoriously difficult. In this code tutorial, we’ll use the H&M Fashion dataset to demonstrate how getML FastProp automates feature engineering for both classification (churn prediction) and regression (sales prediction) with minimal manual effort, outperforming both Relational Deep Learning and a skilled human data scientist according to the RelBench leaderboard. This code tutorial is perfect for data scientists looking to leverage their relational and time-series data data effectively for any kind of predictive analytics applications.
So you've happily used the Raspberry Pi for your homelab projects, of course with Python based solutions as we all do. You've been down the rabbit hole with everything about temperature and humidity measurements, energy and solar tracking, video recording and time-lapse photography, object detection and security surveillance. You don't just buy these things of the shelve. You want to deeply understand what it takes to create such a thing, and you've been quite happy with your results so far, learned a lot. But for many simple applications ... the power draw! Yes, it's just 5 Watts you say for using a Raspberry Pi. Not a big deal in terms of cost. But you'll always need a power adapter and a free socket. You've heard of these guys using microcontrollers that run on batteries or even solar, for days, weeks, even months. That's exciting, but there's also a catch. These people write code in C-like languages, they build firmware to make their projects run. And it's all bare metal! That seems very different. That'll be a steep learning curve to take ... Or is it? Well, there's MicroPython to the rescue. Let me take you with me on a journey to make a simple microcontroller based application to read a Power Meter and send the readings over WiFi for more in depth processing somewhere else.
Defining what constitutes AI skills has always been ambiguous. As AI adoption accelerates across industries and the European AI Act mandates companies to ensure AI literacy among their staff, organizations face growing even more challenges in defining and developing AI competencies. In this talk, we'll present a comprehensive framework developed by the appliedAI Institute's experts that categorizes AI skills across technical, regulatory, strategic, and innovation domains. We'll also share initial data on current AI skills levels and upskilling needs and provide practical strategies for organizations to assess, develop, and acquire the AI capabilities required for their specific needs.
Forecasting and causal inference are distinct but fundamental tasks in data science. While forecasting predicts future outcomes based on history, causal inference explores the "why" behind those outcomes and helps simulate "what if" scenarios. Confusing the two can lead to misleading results. At Blue Yonder, we encountered a case where a customer's forecasting model predicted demand accurately based on price. However, when they used the model for simulations to explore "what if" scenarios, the results were counterintuitive: lower prices led to lower demand. I will share how we resolved this issue and emphasize the importance of incorporating causal thinking when addressing questions like, "Why did this happen?" or "What if I do X?" In this talk, I’ll show how to identify common pitfalls, like confounders, when integrating causal inference into forecasting workflows. We’ll also explore Bayesian models, powered by Markov Chain Monte Carlo (MCMC) methods, to bridge the gap between forecasting and causality using the PyMC library on practical examples. By the end of this talk, you’ll learn how to: - Consider causal reasoning when building models that forecast well but can also be used for interventions and "what if" scenarios. - Visualize causal hypotheses with Directed Acyclic Graphs (DAGs) to understand relationships. - Leverage PyMC to build Bayesian models for testing causal hypotheses and answering "what if" questions.
Many Python developers are enhancing their Rust knowledge and want to take the next step in translating their understanding of advanced concepts like asynchronous programming. In this talk, I'll help you take that step by juxtaposing Python's asyncio with Rust's async ecosystems, tokio and async-std. Through real-world examples and insights from conversations with graingert, co-author of Python's Anyio, we'll explore how each language approaches asynchronous execution, highlighting similarities and differences in syntax, performance, and ecosystem support. This talk aims to persuade you that by leveraging Rust's powerful type system and compiler guarantees, we can build fast, reliable async code that's less prone to race conditions and concurrency bugs. Whether you're a Pythonista venturing into Rust or a Rustacean curious about Python's concurrency model, this session will provide practical insights to help you navigate async programming across both languages. Welcome to Rüstzeit: Prepare to navigate async programming across both ecosystems.
"We've only tested the happy path - now users are finding all sorts of creative ways to break the app." What is already a cause for headaches in traditional software engineering turns into a large challenge when the application is based on machine learning models: Data distribution may change from training phase to deployment. Even worse, humans interacting with the model may adjust their behaviour to the model making the gap between original training environment and deployment even larger. When deployed in a public environment the model may be exposed to users trying to game the system. When re-trained it may be exposed to users trying to poison the pool of training data. We will take a tour of historic cases of models being gamed: What are the lessons we learnt a long time ago building e-mail spam filters? What happened when high search engine rankings started to be linked to monetary income? How can personalization and targeted advertising be exploited to influence public discourse? “… it should be clear that improvements in communication tend to divide mankind …” by Harold Innis in Changing Concepts of Time This keynote will turn interactive engaging the audience in sharing their stories on users playing interesting games with deployed models - including counter moves rolled out. If we are to learn from IT security experience, one important ingredient to address these issues is a combination of collaboration and transparency - across organisations.
The relationship between humans and machines, especially in the context of Artificial Intelligence (AI), is shaped by hopes, concerns, and moral questions. On the one hand, advances in AI offer great promise: it can help us solve complex problems, improve healthcare, streamline workflows, and much more. Yet, at the same time, there are legitimate concerns about the control over this technology, its potential impact on jobs and society, and ethical issues related to discrimination and the loss of human autonomy. In the talk I shall will explore and illustrate the complex tension between innovation and moral responsibility in AI research.
Please note, this is a children's workshop. Recommended age 10-16 years. Experienced use of keyboard and mouse, first words in English (for programming) are required. // Welcome, mini-Pythonistas! In this workshop, we’ll dive into the world of Zümi, a programmable car that’s much more than just wheels and motors. With built-in sensors, lights, and a camera, Zümi can learn to recognize colors, respond to gestures, and even identify faces — all with your help!
The ESA Euclid mission, launched in July 2023, is on a quest to unravel the mysteries of dark energy and dark matter: the enigmatic components that make up 95% of the Universe. By mapping one-third of the sky with unprecedented precision, Euclid is building the largest 3D map of the cosmos. This talk explores how cosmologists bridge theory and and Euclid observation to reveal the hidden nature of dark energy and the dark matter. We will delve into the challenges of cosmological inference, where advanced statistical methods and Python-based pipelines compare theoretical models against Euclid's vast datasets, and we will explain how Bayesian inference, machine learning, and state-of-the-art simulations are revolutionizing our understanding of the cosmos.
In this talk, Leandro will examine the significant benefits of combining open source principles with artificial intelligence. He will walk through the need for openness in language models to build trust, maintain control, mitigate biases, and achieve true alignment and show how open models are rapidly gaining momentum in the AI landscape, challenging proprietary systems through community-driven innovation. Finally, he will then talk about emerging trends and what the community needs to build for the next generation of models.
In this talk, we will have a deeper technical look at AI coding agents, their design, and how they can carry out coding tasks with the support of large language models. We will look at the journey from the user entering a prompt to how it converts to actions in completing the task. After that, we will look at the impact it could make in the industry, as a developer, whether or not you should use an AI coding agent, and what a user should be cautious of when using suchan agent.
From coffee machine settings to chemical reactions to website AB testing - iterative make-test-learn cycles are ubiquitous. The [Bayesian Back End](https://emdgroup.github.io/baybe/stable/) (BayBE) is an open-source experimental planner enabling users to smartly navigate such black-box optimization problems in iterative settings. This tutorial will i) introduce the core concepts enabled by combining Bayesian optimization and machine learning; ii) explain our software design choices, robust tests and open-source libraries this is built on; and iii) provide a short practical hands-on session.
This talk addresses the critical need for use case-specific evaluation of Large Language Model (LLM)-powered applications, highlighting the limitations of generic evaluation benchmarks in capturing domain-specific requirements. It proposes a workflow for designing evaluation pipelines to optimize LLM-based applications, consisting of three key activities: human-expert evaluation and benchmark dataset curation, creation of evaluation agents, and alignment of these agents with human evaluations using the curated datasets. The workflow produces two key outcomes: a curated benchmark dataset for testing LLM applications and an evaluation agent that scores their responses. The presentation further addresses the limitations, and best practices to enhance the reliability of evaluations, ensuring LLM applications are better tailored to specific use cases.
AI-generated synthetic data is gaining traction as a privacy-safe solution for data access and sharing. This data is created from original datasets, maintaining privacy without compromising utility. In this Session, we'll cover the fundamental concepts of AI-generated synthetic data and demonstrate how easy it is to generate synthetic data within your local compute environment using the open-source Synthetic Data SDK.
Duplicate records can have a negative impact on many areas of a business. Current methods to detect duplicate records use traditional NLP techniques known as “Entity Matching”. An improvement to this traditional method can be achieved by incorporating GenAI techniques that do not entail any calls to OpenAI. Not only does this produce better matches, but it also keeps the data safe, since no information is transferred externally.
Explore the power of Human-in-the-Loop (HITL) for GenAI agents! Learn how to build AI systems that augment your abilities, not replace your judgment, especially when high-stakes actions are involved. This session will focus on practical implementation using Python and Langchain to stay in control.
Jeannie is an LLM-based agentic workflow implemented in Python to automate task management for field workers in the energy sector. This system addresses inefficiencies and safety risks in tasks like PV panel installation and powerline repair. Using open-source tools (LangChain family, OpenStreetMap and OpenWeatherMap APIs), Jeannie retrieves tasks, fetches weather and directions, identifies past incidents via RAG, and emails tailored reports with safety warnings. This presentation offers a case study of Jeannie’s implementation for E.ON in Germany, demonstrating how daily task automation enhances worker safety and efficiency. Attendees will discover how to create agentic systems with Python, integrate APIs, and apply RAG for safety applications, with access to open-source code and data for replicating the workflow.
Understanding the structure and content of data frames is crucial when working with tabular data — a core requirement for the robust pipelines we build at QuantCo. Libraries such as `pandera` or `patito` already exist to ease the process of defining data frame schemas and validating that data frames comply with these schemas. However, when building production-ready data pipelines, we encountered limitations of these libraries. Specifically, we were missing support for strict static type checking, validation of interdependent data frames, and graceful validation including introspection of failures. To remedy the shortcomings of these libraries, we started building `dataframely` at the beginning of last year. Dataframely is a declarative data frame validation library with first-class support for polars data frames. Over the last year, we have gained experience in using `dataframely` both for analytical and production code across several projects. The result was a drastic improvement of the legibility of our pipeline code and our confidence in its correctness. To enable the wider data engineering community to benefit from similar effects, we have recently open-sourced `dataframely` and are keen on introducing it in this talk.
Join us as we explore integrating Gurobi and prescriptive analytics into your Python ecosystem. In this session, you’ll discover model-building techniques that leverage NumPy and SciPy.sparse as well as the data structures of pandas. We’ll also show you how to seamlessly integrate trained regressors from scikit-learn as constraints in your optimization models. Elevate your workflows and unlock new decision-making capabilities with Gurobi in Python.
**Rosenxt** is the host of a number of ventures aiming to provide next level solutions for demanding problems in a variety of industries based on decades of engineering excellence. Some of them address challenges in water environments ranging from water pipelines to offshore applications. As differing as these areas may seem, regarding the solutions we build for them they have a lot in common. Whether its the necessary power supply, movement and steering concepts or sensing approaches. All of them benefit from generalized, smart solutions that we design as components that can later be orchestrated and configured in various setups to fulfill quite different purposes. This presentation explores the versatility of leveraging a Raspberry Pi based hardware platform combined with a Python based application stack to bridge development and deployment of various basic components, such as motors and motor controllers, lift foils, steering units and controls. By utilizing a unified platform, we demonstrate how the same system can seamlessly transition from test bench measurements during hardware component development to real-world applications for various industries. The talk highlights how this approach can create a robust framework to help streamlining workflows, enhance scalability and reduce costs.