Large Action Models (LAMs) were first introduced by Rabbit with the launch of their R1 device, aiming to create end-to-end trained models that automatically translate human instructions into actions. Since then, the definition of LAMs has evolved to encompass Large Language Models (LLMs) utilized in multi-agent settings. Notable examples include Anthropic's "Computer Use" feature in their Claude model and Google's Project Mariner. These projects allow LLMs to operate a web browser or computer in a human-like manner by viewing the screen, moving the cursor, clicking buttons, and typing text, thereby fulfilling the original promise of LAMs by effectively translating human instructions into automated actions.
We present an innovative application of LAMs that automates the job application process using AI. Our system autonomously navigates unfamiliar website structures, fills out forms, handles document uploads, and manages cookie banners without human intervention. This level of automation streamlines the application process for job seekers while ensuring accurate and timely submissions.
To achieve this, we leveraged the LaVague framework, which employs a modular, agent-based approach:
These agents work iteratively, performing tasks step by step until either the objective is achieved, or a maximum number of steps is reached.
By building a custom solution around the LaVague framework tailored specifically for the job application process, we successfully automated the entire workflow. In our presentation, we discuss our overall architecture, the challenges encountered during development and share valuable lessons learned for practical adoption.
Large Action Models like these highlight the transformative potential of AI in automating intricate tasks, bridging the gap between understanding human intentions and executing them in dynamic, real-world scenarios.