Inspiration
Current agents navigate the web by converting massive HTML DOM trees or screenshots into enormous walls of text. This brute-force parsing approach crams up to 50,000 tokens into the context window by step 15 of a simple workflow, triggering a Latency & Cost Death Spiral that makes production deployment financially impossible.
Furthermore, LLMs suffer heavily from "lost in the middle" attention drift when overwhelmed by useless boilerplate code like sidebar menus and cookie banners, completely forgetting their original objective. If a sudden, unmapped multi-factor authentication (2FA) popup appears, the historical context loop shatters completely, trapping the traditional agent in a loop failure.
Inspired from the *Memory Inception paper: link
What it does
Agent Inception bypasses the token window constraints entirely to enable blazing fast, ultra-low-cost, resilient web automation. Instead of forcing the LLM to repeatedly read and re-read raw website code on every click, our system establishes Zero-Prompt Navigation. The agent's actual chat history stays nearly blank, containing only the user's high-level goal.
As the browser loads new pages, the system instantly identifies the viewport state and injects pre-compiled architectural "cheat sheets" into the model's short-term memory layer. When unexpected disruptions occur, the browser triggers Stealth Steering: it hot-swaps a highly focused mitigation memory block into the context, executes the necessary resolution via native browser tools, and instantly swaps the original website map back into play without losing track of the long-term objective.
How we built it
Agent Inception introduces an optimization concept called Zero-Prompt Navigation.
Instead of repeatedly describing webpages using text serialization, the architecture activates pre-computed, text-conditioned memory modules representing structural knowledge about common interfaces and recovery loops. The active, visible chat prompt history remains strictly focused on the user's high-level intent, while structural layout constraints are dynamically loaded into the transformer's internal attention mechanisms.
User Intent ↓ Short Prompt ↓ Retrieve Relevant Memory Module ↓ Inject Structural Guidance ↓ Browser Action ↓ Adapt to Interruptions (Stealth Steering) ↓ Continue Original Task
This structural separation allows the agent to:
Maintain a flat, compact prompt footprint across extended workflow Lifetimes. Isolate objective reasoning from noisy layout changes. Recover from unexpected browser context shifts instantly. Eliminate exponential token compounding costs.
Challenges we ran into
Going through the documentation of new tools.
Running an SLM in our local machine and setting GPUs in AWS.
Creating a demo product for the technical concept we are trying to solve.
Accomplishments that we're proud of
Learning new concepts and building a product out of it to demonstrate a technical concept of improving the agents memory.
What we learned
All the new tools and different concepts around agents.
What's next for AgentInception
Our development vector targets moving from simulation models to an immutable global matrix routing system:
The Global Memory Layer: An open-source, decentralized registry hosting pre-compiled, KV-equivalent memory blocks for every major enterprise software framework globally.
Zero-Token OS Agents: Expanding the context-injection loop beyond web browsers into local desktop operating environments to fluidly navigate massive desktop apps.
True Enterprise Scalability: Shifting web agents from slow, expensive developer novelties into industrial-grade automation utilities operating at native software speeds.
Built With
- airbyte
- amazon-web-services
- clickhouse
- nextjs
- render
Log in or sign up for Devpost to join the conversation.