Inspiration
We wanted to build the fastest browser agent possible without sacrificing accuracy. Existing agents were too slow, so we focused entirely on latency reduction and atomic efficiency.
What it does
KK Agent is a high-speed multimodal browser agent. It navigates real-world web apps (Calendar, Email, Social) by visually processing the page and executing precise actions. It is specifically tuned to minimize the time between "seeing" and "acting."
How we built it
We stripped away every millisecond of latency through aggressive optimization:
- Atomic Tools: Built
replace_textandtype_dropdownto combine multiple steps (click, clear, type, enter) into single actions, drastically reducing model round-trips. - Image Pipeline: Implemented local downscaling of screenshots before sending them to the API, cutting processing time significantly.
- Latency Tuning: Reduced browser timeouts to 500ms and minimized typing delays for near-instant execution.
- Model Config: Configured
thinking_levelandmedia_resolutionto "low" on Gemini 3 Pro to prioritize raw speed.
Challenges we ran into
- The "Naughty" Timeout: Dropping browser page timeouts to 500ms was risky—it made the agent incredibly fast but prone to race conditions which we had to handle gracefully.
- Tab Detection: The model struggled to distinguish between navigation tabs and buttons, requiring specific prompt tweaks to fix.
Accomplishments that we're proud of
- 35s Benchmark: We successfully brought the Calendar task time down to ~35 seconds through our local downscaling and delay reduction strategies.
- Reliable Inputs: The
replace_texttool completely solved issues with clearing fields, making form filling bulletproof.
What we learned
- Downscaling Wins: processing smaller images locally is faster than sending full-res images to the cloud.
- Atomic > Chatty: Combining actions into single tools is the single biggest performance booster.
What's next for KK Agent
- Refining the "Nano" model experiments for even faster local execution.
- Re-enabling parallel task execution once the single-thread performance is fully maximized.
Log in or sign up for Devpost to join the conversation.