🚀 REAL Hackathon Browser Agent
Inspiration
This hack challenge pushed us to build an agent that can operate reliably under real-world constraints — fast, accurate, and versatile enough to handle practical browser tasks. The competitive setup and the REAL benchmark motivated us to push for an agent that could function like a true digital assistant.
What it does
Our agent can autonomously perform everyday browser operations such as:
- Applying to jobs or filling out forms
- Sending emails
- Setting calendar events
- Booking hotels
- Navigating links and multi-page flows
Essentially, it acts as a general-purpose browser automation assistant powered by vision-language reasoning.
How we built it
We experimented with two approaches:
Enhanced Prompt Engineering + Reflection Loop
We began with an existing agent framework and attempted to strengthen reliability using a self-reflection feedback loop.
The goal: enable the agent to critique its past actions and improve.Orchestrator-Based Architecture
When reflection alone didn’t yield stable performance, we added an orchestrator model to structure tasks, guide decisions, and maintain coherence through complex multi-step interactions.
This combination gave us a more stable and efficient agent pipeline.
Challenges we ran into
- API credit limitations restricted our ability to extensively test iterations.
- Flow connection issues, especially when integrating multiple models.
- Debugging multi-agent control with a VLM was harder than expected.
- We spent nearly 4 hours debugging a “headless: false” issue in the browser runtime.
Accomplishments we’re proud of
- Our agent successfully completed a large variety of REAL-style tasks.
- Despite limited time and compute, we were able to create a pipeline that runs reliably across multiple task types.
- We validated that a hybrid approach (reflection + orchestrator) can significantly improve consistency.
What we learned
- Multi-agent systems with VLMs are extremely hard to synchronize — keeping track of state, screenshots, and browser feedback loops is non-trivial.
- Real-world agents require tight control, error recovery, and robust interface mapping.
- Debugging browser automation under time pressure teaches patience and resilience.
What’s next for the REAL Hackathon Browser Agent
We plan to:
- Polish the agent into a long-term entrant for the 3-month REAL global leaderboard challenge.
- Improve error recovery, latency, and batching.
- Add deeper reasoning layers and stronger UI element detection.
- Push toward a production-grade autonomous browser assistant.
Built With
- openrouter
- python
- qwen3
- real-bench
Log in or sign up for Devpost to join the conversation.