Digital Bridge : Project Story

Inspiration

Watching a family member spend 45 minutes trying to refill a prescription online, only to give up and call a busy phone line, made the problem impossible to ignore. Over 54 million Americans aged 65+ struggle daily with complex web portals for healthcare, pharmacy, and utility services. The interfaces aren't designed for them. We asked:

What if they never had to touch the portal at all?

What We Built

Digital Bridge is a voice-driven accessibility agent that translates natural language into real web actions. A user speaks a request, "I need to see Doctor Smith next Tuesday" , and the system handles everything: parsing intent, navigating the portal, filling forms, and submitting, returning a screenshot as proof of completion.

The automation pipeline can be modeled as a simple function:

$$f(\text{voice}) \rightarrow \text{intent JSON} \rightarrow \text{browser actions} \rightarrow \text{confirmation}$$

Where intent extraction minimizes ambiguity error $\epsilon$ through strict zero-shot prompting:

$$\epsilon = P(\text{missing field} \mid \text{transcript})$$

We drive $\epsilon$ toward zero by prompting the user for clarification rather than failing silently.

How We Built It

The stack is intentionally lean for a 24-hour sprint:

  • Frontendindex.html + app.js with Web Speech API for on-device transcription. High-contrast, brutalist accessibility design.
  • Backend — FastAPI (main.py) exposes POST /process-voice and streams real-time status via Server-Sent Events (SSE).
  • LLM Layerllm_parser.py sends the transcript to Gemini (primary) or Claude (fallback) with a strict JSON output prompt: {"action": "schedule", "doctor": "smith", "date": "2026-04-22"}
  • Automationautomation.py uses Playwright to spin up a Chromium instance, navigate the portal, fill fields by CSS selector, submit, and capture a Base64 screenshot of the success page.
  • Infrastructure — Docker Compose bundles the FastAPI backend, Playwright/Chromium, and the dummy portal into a single docker compose up --build deploy, tested on AWS EC2.

Challenges

1. LLM output reliability Getting consistent, parseable JSON from a noisy voice transcript was harder than expected. We solved it with aggressive system prompting, explicit output format examples, and a fallback clarification loop — so the app never crashes, it just asks for what it needs.

2. Playwright in Docker Chromium inside a container requires specific system dependencies that aren't obvious. We lost ~2 hours to dependency hell before landing on a stable Dockerfile that installs all required libs cleanly.

3. Real-time feedback UX A headless browser automation feels like a black box to the user. We implemented SSE status streaming so every step — "Parsing intent… Navigating portal… Submitting form…" — is visible in real time. Trust through transparency.

4. Selector stability Real portals change their DOM constantly. For the hackathon we built a dummy portal with predictable id-tagged selectors, plus a /automation/validate/{portal} endpoint to check selector health before any voice command runs.

What We Learned

  • Zero-shot prompting with strict output schemas is surprisingly robust for intent extraction on messy speech transcripts.
  • SSE is underused — it's simpler than WebSockets and perfect for one-way progress streaming.
  • Accessibility-first design constraints (high contrast, large tap targets, voice-only flow) actually produce cleaner interfaces overall.
  • You can ship something genuinely useful in 24 hours if you ruthlessly protect the happy path and defer edge cases.

Built With

Share this project:

Updates