Information Retrieval goes beyond keyword matching - it’s about intent, context, and delivering relevant and accurate results. As RAG applications gain traction, understanding the retrieval process becomes more crucial for developers, data scientists, and search engineers.
We start with the Why. People have different needs for search - lookup, research, and inspiration. Each of these needs can be influenced and affected by the key IR metrics of search engines: precision, recall, and desirability. Having introduced these fundamentals, we go into common retrieval challenges, such as ambiguity, mismatched vocabularies, and the impact of context.
Aiming to solve these challenges, we then go into advanced search techniques, comparing sparse (keyword-based) and dense (vector-based) retrieval, highlighting their strengths and limitations. We’ll explore hybrid search as a powerful approach that blends these techniques. In a live demo, using crawled data from the Sendung mit der Maus, we’ll showcase a hybrid search setup leveraging tools like Mistral, Elasticsearch, and Streamlit. While the dataset language is German, the core concepts and search dynamics should hopefully be easily understandable also for non native speakers.
The talk concludes with key takeaways on building effective search systems and a look ahead at future developments in contextualized search.
Tentative Outline:
Introduction to Information Retrieval (~ 5 min) Why do we search? Lookup, research, inspiration Core metrics: precision, recall, desirability
Challenges in Search and Retrieval (~ 5 min) Ambiguity Discrepancy in query and content * The impact of context
Search Techniques (~ 10 min) Sparse vs dense retrieval: comparing keyword and vector search (semantic search, embeddings, synsets, decompounders) Hybrid search: Combining sparse and dense approaches
Hybrid Search in Action (< 10 min) Setting up a hybrid search with Mistral, Elasticsearch, and Streamlit Live Demo: exploring search in Lach- & Sachgeschichten from Sendung mit der Maus
Takeaways & Outlook (< 5 min)
The talk is directed at anyone interested in building or improving search systems. Attendees will gain a deeper understanding of the tools, methodologies, and metrics essential for building robust and explainable search systems.