Lessons learned in bringing a RAG chatbot with access to 50k+ diverse documents to production

Thursday 16:55 in Titanium3

Building a prototype RAG chatbot with frameworks like LangChain can be straightforward. However, scaling it into a production-grade application introduces complex challenges. In this talk, we share our lessons learned from developing a RAG chatbot designed to assist research and development (R&D) experts.

Our chatbot was developed to effectively handle and provide access to a large collection of unstructured knowledge, consisting of over 50,000 documents stored across more than 20 SharePoint sites and other sources. We faced significant hurdles in:

Data Pipeline Engineering: Crafting a modular and scalable pipeline capable of periodically syncing documents, handling dynamic user permissions, and efficiently processing large volumes of unstructured data.
RAG Design and Prompting Strategies: Addressing challenges in document chunking, citation integration, reranking retrieved results, and applying permission and PII filters to ensure compliance and accuracy in responses.
Evaluation Framework Development: Implementing an effective testing strategy without the availability of static ground truth data. We employed automated testing with frameworks like pytest, utilized LLM-as-a-judge, and integrated tracing to iteratively refine our dataset and maintain high answer quality.
User Adoption: Driving user adoption through onboarding training and ongoing engagement, such as regular office hours and feedback mechanisms.

We emphasize the importance of applying data science principles to GenAI projects:

Start Simple and Iterate: Begin with a basic implementation as a baseline and iteratively enhance functionality based on testing and user feedback.
Test-Driven Development: Identify key test scenarios early and use them to drive development, ensuring that improvements are measurable and aligned with growing user needs.
Focus on Key Metrics: Establish clear metrics to optimize against, aiding in making informed decisions throughout the development process.

Main Takeaways for the Audience:

Understand the critical role of robust, modular data pipelines in handling dynamic and unstructured data sources for LLM applications.
Learn strategies for developing effective evaluation frameworks in complex domains where traditional ground truth data may be lacking.
Gain insights into advanced RAG design techniques that enhance chatbot performance and reliability.
Recognize the substantial data engineering and software development efforts required to transition a prototype to a production-grade LLM solution.

By sharing our experiences, attendees will gain practical insights into deploying robust RAG chatbots, transforming a functional prototype into a reliable, scalable application that fulfills enterprise requirements.

Bernhard Schäfer

Bernhard is a Senior Data Scientist at Merck with a PhD in deep learning and over 5 years of experience in applying data science and data engineering within different industries. For more information you can connect with him on LinkedIn. 🙂

Nico Mohr

Nico works as a Senior Machine Learning Engineer at Merck, focusing on developing applications powered by LLMs. His background bridges software engineering and data science, with experience spanning classical data science, computer vision, and discrete optimization, where he has deployed several machine learning solutions in production environments.