Thursday 16:55
in Titanium3
Building a prototype RAG chatbot with frameworks like LangChain can be straightforward. However, scaling it into a production-grade application introduces complex challenges. In this talk, we share our lessons learned from developing a RAG chatbot designed to assist research and development (R&D) experts.
Our chatbot was developed to effectively handle and provide access to a large collection of unstructured knowledge, consisting of over 50,000 documents stored across more than 20 SharePoint sites and other sources. We faced significant hurdles in:
- Data Pipeline Engineering: Crafting a modular and scalable pipeline capable of periodically syncing documents, handling dynamic user permissions, and efficiently processing large volumes of unstructured data.
- Evaluation Framework Development: Implementing an effective testing strategy without the availability of static ground truth data. We employed automated testing with frameworks like pytest, utilized LLM-as-a-judge, and integrated tracing to iteratively refine our dataset and maintain high answer quality.
- RAG Design and Prompting Strategies: Addressing challenges in document chunking, citation integration, reranking retrieved results, and applying permission and PII filters to ensure compliance and accuracy in responses.
- User Adoption: Driving user adoption through onboarding training and ongoing engagement, such as regular office hours and feedback mechanisms.
We emphasize the importance of applying data science principles to GenAI projects:
- Start Simple and Iterate: Begin with a basic implementation as a baseline and iteratively enhance functionality based on testing and user feedback.
- Test-Driven Development: Identify key test scenarios early and use them to drive development, ensuring that improvements are measurable and aligned with growing user needs.
- Focus on Key Metrics: Establish clear metrics to optimize against, aiding in making informed decisions throughout the development process.
Main Takeaways for the Audience:
- Understand the critical role of robust, modular data pipelines in handling dynamic and unstructured data sources for LLM applications.
- Learn strategies for developing effective evaluation frameworks in complex domains where traditional ground truth data may be lacking.
- Gain insights into advanced RAG design techniques that enhance chatbot performance and reliability.
- Recognize the substantial data engineering and software development efforts required to transition a prototype to a production-grade LLM solution.
By sharing our experiences, attendees will gain practical insights into deploying robust RAG chatbots, transforming a functional prototype into a reliable, scalable application that fulfills enterprise requirements.
Bernhard Schäfer
Bernhard is a Senior Data Scientist at Merck. For more information you can connect with him on LinkedIn. :-)