Inspiration
Daily life now runs on phone trees and web portals. If you're homebound, low-vision, low-literacy, or elderly, that quietly locks you out of basic dignity-level tasks: refilling a prescription, renewing a registration, paying a bill. Existing tools assume you can see a screen and supervise an agent the whole way. The people who most need help are the ones those tools were never built for.
Open Door is our swing at that gap: an agent that talks to you, works the web for you, and stops to ask out loud before it ever spends your money or does something it can't undo.
What it does
You say what you need ("I need to refill my metformin, I can't get to the pharmacy")
Open Door:
- Plans it out loud: One Claude call turns your spoken goal into a visible, ordered plan. Each step labeled with which "body" handles it and why
- Works the portal for you: A real browser navigates the pharmacy site, finds the prescription, and reads the actual out-of-pocket cost off the page
- Stops at the gate: Before the one irreversible step — submitting the refill — it speaks the cost aloud ("This will charge you $14 and submit your refill. Should I go ahead?") and waits. You answer by voice -> "yes" or "no"
- Only then acts: On "yes," it submits and reads back the confirmation. On "no," it stops cold and tells you nothing was charged
The plan is the hero of the screen: steps light up as they run, the gated step pauses red, and the question is spoken, so a person who can't see or touch the screen can complete a real, costly action entirely by voice.
How we built it
The architecture is built around seams — every external service is swappable, so the whole thing runs offline against mocks and flips to live with one env var.
- Planner (Anthropic, Claude Opus 4.8) - The centerpiece: A strict-JSON planner with adaptive thinking, defensive parsing, and a content-guard that retries if the spoken wording reads like a stage direction instead of real speech. It generalizes: give it a DMV renewal or a utility bill and it produces a correct, gated plan
- Browse leg (Browserbase + Playwright) - Drives a real Chromium through the portal: Local by default (free), Browserbase cloud with one env flag — same Playwright code over CDP. Self-healing navigation so the run survives messy pages. Stop-before-submit is structural: the effector never decides to submit; it only clicks the button when dispatch hands it the gated step, which only happens after a human "yes"
- Speak leg (Deepgram) - Both directions: TTS (Aura) voices the gate question and every spoken line; STT (Nova) hears your goal at intake and your yes/no at the gate. The spoken confirmation is load-bearing, not decoration
- Dispatch + the human gate - A pausable state machine: it physically parks the browser on the irreversible button and refuses to proceed until a human decides. Negative answers win on ambiguity. It never proceeds unless it clearly heard "yes." Declining skips the rest of the plan so it can never falsely report success
- Observability (Sentry) - Every effector is instrumented: every gate leaves a breadcrumb of exactly what the human approved. We exercised it for real (killed the portal mid-browse and watched the capture land), because un-triggered observability doesn't count
- Frontend - A single-page hero: UI streaming live state over Server-Sent Events (push on change, not polling), plus a connected-services landing that frames Open Door as a platform for all of daily life's errands
22 regression tests, an 8/8 live health check, and an offline-first build kept it honest.
Challenges we ran into
- Making spoken output sound human: The planner kept reading step descriptions aloud ("Confirm which pharmacy holds their prescription"). We fixed it with a sharper prompt plus a deterministic guard that detects stage-direction phrasing and retries for real second-person speech
- Generalist planner vs. scripted hands: The planner imagines portal features (home delivery) the mock fixture doesn't have. We made the browse leg self-heal and tolerant so any goal completes rather than timing out
- A "no" that still said yes: Early on, declining the gate still ran the downstream "all done" steps. We made declining halt the plan — the bug that most violated our own thesis, and the one we're proudest to have caught
Accomplishments that we're proud of
The spoken safety gate. The agent parks on the irreversible button and asks aloud, and you answer aloud. It's a small thing that makes a powerful agent safe to hand to someone who can't supervise it. That's the whole point.
What we learned
Building a careful agent is mostly about designing where it stops, not where it acts. The seams and the gate were more engineering than the "doing," and that's the right ratio for something that spends a vulnerable person's money.
What's next for Open Door
Generalize the browse leg to natural-language web navigation so it handles any real portal (the planner already generalizes); add real account connections per service; a pending-errands queue and an in-app action history.
Built With
- anthropic
- browserbase
- claude
- deepgram
- fastapi
- playwright
- python
- sentry

Log in or sign up for Devpost to join the conversation.