Inspiration

We wanted to build the fastest browser agent possible without sacrificing accuracy. Existing agents were too slow, so we focused entirely on latency reduction and atomic efficiency.

What it does

KK Agent is a high-speed multimodal browser agent. It navigates real-world web apps (Calendar, Email, Social) by visually processing the page and executing precise actions. It is specifically tuned to minimize the time between "seeing" and "acting."

How we built it

We stripped away every millisecond of latency through aggressive optimization:

  • Atomic Tools: Built replace_text and type_dropdown to combine multiple steps (click, clear, type, enter) into single actions, drastically reducing model round-trips.
  • Image Pipeline: Implemented local downscaling of screenshots before sending them to the API, cutting processing time significantly.
  • Latency Tuning: Reduced browser timeouts to 500ms and minimized typing delays for near-instant execution.
  • Model Config: Configured thinking_level and media_resolution to "low" on Gemini 3 Pro to prioritize raw speed.

Challenges we ran into

  • The "Naughty" Timeout: Dropping browser page timeouts to 500ms was risky—it made the agent incredibly fast but prone to race conditions which we had to handle gracefully.
  • Tab Detection: The model struggled to distinguish between navigation tabs and buttons, requiring specific prompt tweaks to fix.

Accomplishments that we're proud of

  • 35s Benchmark: We successfully brought the Calendar task time down to ~35 seconds through our local downscaling and delay reduction strategies.
  • Reliable Inputs: The replace_text tool completely solved issues with clearing fields, making form filling bulletproof.

What we learned

  • Downscaling Wins: processing smaller images locally is faster than sending full-res images to the cloud.
  • Atomic > Chatty: Combining actions into single tools is the single biggest performance booster.

What's next for KK Agent

  • Refining the "Nano" model experiments for even faster local execution.
  • Re-enabling parallel task execution once the single-thread performance is fully maximized.

Built With

Share this project:

Updates

posted an update

I will note that one of my optimizations is failing on the Calendar time dropdown but would likely succeed on Google Calendar, type_dropdown tool I created to reduce on tool calls and screenshots required.

Log in or sign up for Devpost to join the conversation.