While text-based RAG systems have been everywhere in the last year and a half, the real power comes from combining multiple modalities
In this session, we'll show how it is possible to create a Multimodal RAG system using Open-Source tools such as vLLM for deploying your LLM, Pixtral for multimodal understanding, and Milvus for storing your vectors.
I'll walk you through the architecture and implementation details of setting up your own infrastructure, demonstrating how to effectively process and understand both images and text in a unified system. The talk will include a live demo, showing the real-time implementation and performance of the system.
Whether you're looking to reduce dependency on commercial APIs or need more control over your LLM infrastructure, this talk will provide you with practical insights and implementation strategies.