Inspiration

The web is supposed to be for everyone — but it isn't. Cluttered interfaces, confusing flows, and zero margin for error make basic tasks like refilling a prescription or paying a bill genuinely hard for elderly users and people with disabilities. We kept noticing this gap and asked: what if you never had to navigate a website at all?

What it does

You speak a task out loud. Navigator opens a real browser, navigates to the right site, fills out the forms, and narrates every step back to you in plain English. Before any irreversible action — submitting a form, confirming a payment — it stops and asks your permission. A full-screen YES/NO prompt listens for your voice. Nothing happens without your go-ahead.

How we built it

The backend uses browser-use to give Gemini direct control over a Playwright-driven Chromium browser. A second Gemini pass (our "Narrationifier") translates raw browser actions into plain-English narration before streaming them to the frontend over SSE. The React frontend is intentionally minimal — one mic button, a narration feed, and a full-screen confirmation modal — all driven by the event stream. Voice input and output run entirely through the Web Speech API.

Challenges we ran into

Getting browser-use and Gemini to behave reliably was the hardest part. The LLM makes dozens of sequential decisions per task and would occasionally misread page state, hallucinate selectors, or stall mid-flow. Fixing this took careful prompt engineering and fallback handling throughout. Building the confirmation pause — freezing the agent mid-execution and resuming cleanly after user input — also required more async work than expected.

Accomplishments that we're proud of

The full voice → browse → narrate → confirm loop works end-to-end on real websites. That's genuinely hard to pull off in a hackathon. We're also proud of the confirmation modal — full-screen, voice-aware, impossible to accidentally dismiss — and the Narrationifier consistently producing natural sentences from raw browser events.

What we learned

  • AI agents need guardrails at every layer.
  • Latency compounds fast.
  • Designing for people who can't tolerate complexity forces you to cut everything non-essential.

What's next for easybrowser.tech

Expanding task coverage (banking, appointments, government services), persistent user profiles so Navigator remembers your pharmacy and provider, mobile support, and a caregiver mode for family members to set up and monitor tasks on behalf of a loved one.

Built With

Share this project:

Updates