🚀 REAL Hackathon Browser Agent

Inspiration

This hack challenge pushed us to build an agent that can operate reliably under real-world constraints — fast, accurate, and versatile enough to handle practical browser tasks. The competitive setup and the REAL benchmark motivated us to push for an agent that could function like a true digital assistant.


What it does

Our agent can autonomously perform everyday browser operations such as:

  • Applying to jobs or filling out forms
  • Sending emails
  • Setting calendar events
  • Booking hotels
  • Navigating links and multi-page flows

Essentially, it acts as a general-purpose browser automation assistant powered by vision-language reasoning.


How we built it

We experimented with two approaches:

  1. Enhanced Prompt Engineering + Reflection Loop
    We began with an existing agent framework and attempted to strengthen reliability using a self-reflection feedback loop.
    The goal: enable the agent to critique its past actions and improve.

  2. Orchestrator-Based Architecture
    When reflection alone didn’t yield stable performance, we added an orchestrator model to structure tasks, guide decisions, and maintain coherence through complex multi-step interactions.

This combination gave us a more stable and efficient agent pipeline.


Challenges we ran into

  • API credit limitations restricted our ability to extensively test iterations.
  • Flow connection issues, especially when integrating multiple models.
  • Debugging multi-agent control with a VLM was harder than expected.
  • We spent nearly 4 hours debugging a “headless: false” issue in the browser runtime.

Accomplishments we’re proud of

  • Our agent successfully completed a large variety of REAL-style tasks.
  • Despite limited time and compute, we were able to create a pipeline that runs reliably across multiple task types.
  • We validated that a hybrid approach (reflection + orchestrator) can significantly improve consistency.

What we learned

  • Multi-agent systems with VLMs are extremely hard to synchronize — keeping track of state, screenshots, and browser feedback loops is non-trivial.
  • Real-world agents require tight control, error recovery, and robust interface mapping.
  • Debugging browser automation under time pressure teaches patience and resilience.

What’s next for the REAL Hackathon Browser Agent

We plan to:

  • Polish the agent into a long-term entrant for the 3-month REAL global leaderboard challenge.
  • Improve error recovery, latency, and batching.
  • Add deeper reasoning layers and stronger UI element detection.
  • Push toward a production-grade autonomous browser assistant.

Built With

  • openrouter
  • python
  • qwen3
  • real-bench
Share this project:

Updates