Basic large language models struggle with complex reasoning. New techniques, broadly referred to as "test time compute" have emerged that allow these models to spend more time processing before giving an answer. Direct token sampling can be seen as analogous to system-1 thinking and explicit step-by-step reasoning as system-2. Many top AI researchers and companes are now working on building system-2 into AI systems to improve general reasoning.
We will review the newest open research on test time computation including promising techniques that have appeared in top entries for François Chollet's ARC-AGI challenge. While OpenAI has shamefully kept the research behind their o1, o3 and o-N models secret, other researchers have worked in public, demonstrating how to use test time compute to greatly boost model performance with the right fine-tuning and test time procedures.
This talk will explore the latest developments in the rapidly developing area of system-2 AI reasoning, the engine behind the only significant gains in LLM performance recently. Giving LLMs system-2 like capabilities improves problem solving, code generation quality and reduces hallucinations, get up to speed on research behind these techniques.