Real-Agent

🚀 REAL Hackathon Browser Agent

Inspiration

This hack challenge pushed us to build an agent that can operate reliably under real-world constraints — fast, accurate, and versatile enough to handle practical browser tasks. The competitive setup and the REAL benchmark motivated us to push for an agent that could function like a true digital assistant.

What it does

Our agent can autonomously perform everyday browser operations such as:

Applying to jobs or filling out forms
Sending emails
Setting calendar events
Booking hotels
Navigating links and multi-page flows

Essentially, it acts as a general-purpose browser automation assistant powered by vision-language reasoning.

How we built it

We experimented with two approaches:

Enhanced Prompt Engineering + Reflection Loop
We began with an existing agent framework and attempted to strengthen reliability using a self-reflection feedback loop.
The goal: enable the agent to critique its past actions and improve.
Orchestrator-Based Architecture
When reflection alone didn’t yield stable performance, we added an orchestrator model to structure tasks, guide decisions, and maintain coherence through complex multi-step interactions.

This combination gave us a more stable and efficient agent pipeline.

Challenges we ran into

API credit limitations restricted our ability to extensively test iterations.
Flow connection issues, especially when integrating multiple models.
Debugging multi-agent control with a VLM was harder than expected.
We spent nearly 4 hours debugging a “headless: false” issue in the browser runtime.

Accomplishments we’re proud of

Our agent successfully completed a large variety of REAL-style tasks.
Despite limited time and compute, we were able to create a pipeline that runs reliably across multiple task types.
We validated that a hybrid approach (reflection + orchestrator) can significantly improve consistency.

What we learned

Multi-agent systems with VLMs are extremely hard to synchronize — keeping track of state, screenshots, and browser feedback loops is non-trivial.
Real-world agents require tight control, error recovery, and robust interface mapping.
Debugging browser automation under time pressure teaches patience and resilience.

What’s next for the REAL Hackathon Browser Agent

We plan to:

Polish the agent into a long-term entrant for the 3-month REAL global leaderboard challenge.
Improve error recovery, latency, and batching.
Add deeper reasoning layers and stronger UI element detection.
Push toward a production-grade autonomous browser assistant.