Lessons learned in bringing a RAG chatbot with access to 50k+ diverse documents to production

Bernhard Schäfer

Thursday 16:55 in Titanium3

Building a prototype RAG chatbot with frameworks like LangChain can be straightforward. However, scaling it into a production-grade application introduces complex challenges. In this talk, we share our lessons learned from developing a RAG chatbot designed to assist research and development (R&D) experts.

Our chatbot was developed to effectively handle and provide access to a large collection of unstructured knowledge, consisting of over 50,000 documents stored across more than 20 SharePoint sites and other sources. We faced significant hurdles in:

  • Data Pipeline Engineering: Crafting a modular and scalable pipeline capable of periodically syncing documents, handling dynamic user permissions, and efficiently processing large volumes of unstructured data.
  • Evaluation Framework Development: Implementing an effective testing strategy without the availability of static ground truth data. We employed automated testing with frameworks like pytest, utilized LLM-as-a-judge, and integrated tracing to iteratively refine our dataset and maintain high answer quality.
  • RAG Design and Prompting Strategies: Addressing challenges in document chunking, citation integration, reranking retrieved results, and applying permission and PII filters to ensure compliance and accuracy in responses.
  • User Adoption: Driving user adoption through onboarding training and ongoing engagement, such as regular office hours and feedback mechanisms.

We emphasize the importance of applying data science principles to GenAI projects:

  • Start Simple and Iterate: Begin with a basic implementation as a baseline and iteratively enhance functionality based on testing and user feedback.
  • Test-Driven Development: Identify key test scenarios early and use them to drive development, ensuring that improvements are measurable and aligned with growing user needs.
  • Focus on Key Metrics: Establish clear metrics to optimize against, aiding in making informed decisions throughout the development process.

Main Takeaways for the Audience:

  • Understand the critical role of robust, modular data pipelines in handling dynamic and unstructured data sources for LLM applications.
  • Learn strategies for developing effective evaluation frameworks in complex domains where traditional ground truth data may be lacking.
  • Gain insights into advanced RAG design techniques that enhance chatbot performance and reliability.
  • Recognize the substantial data engineering and software development efforts required to transition a prototype to a production-grade LLM solution.

By sharing our experiences, attendees will gain practical insights into deploying robust RAG chatbots, transforming a functional prototype into a reliable, scalable application that fulfills enterprise requirements.

Bernhard Schäfer

Bernhard is a Senior Data Scientist at Merck. For more information you can connect with him on LinkedIn. :-)