Web accessibility tools often require screen-reader expertise or browser extensions that users can’t install on shared machines. We wanted a friction-free, speech-only layer that works on any single-page app, costs (almost) nothing to run, and shows how far AWS’s serverless stack—and Bedrock—can go in democratizing access.
What it does
- Record voice in the browser.
- Upload clip to S3.
- Lambda #1 – StoreConn-Voice
($connect / $disconnect routes) saves & prunes WebSocketconnectionIds in DynamoDB. - Lambda #2 – Transcribe Trigger
S3 event starts an Amazon Transcribe async job. - Transcribe drops JSON in
transcribe-output/→ Lambda #3 – Bedrock Intent runs. - Bedrock (Claude-3 Sonnet) returns an intent such as
{"action":"click","selector":"#nav-book"}. - Lambda #3 fan-outs the intent via API Gateway WebSocket to open tab.
- Amplify hosting forFront-end JavaScript clicks/navigates/inputs text—hands-free control.
How we built it
Three Python 3.12 Lambdas:
- StoreConn-Voice (WS connect/disconnect)
- Transcribe Trigger (start job)
- Bedrock Intent (parse transcript → Bedrock → broadcast)
Minimal SPA: plain HTML/JS; no front-end framework.
Prompt-engineered Bedrock to output strict JSON and whitelisted selectors.
Live-reloading Amplify hosting for rapid UX tweaking.
Challenges we ran into
Escaping {} in Python .format killed Bedrock calls (KeyError "action").
API Gateway WS management URL vs. client WSS URL—double “@connections” 404s.
Bedrock sometimes hallucinated selectors; solved with an alias map and stricter prompt.
Transcribe async adds ~45 s latency; streaming wasn’t available in free tier.
Accomplishments we’re proud of
- Zero servers—full pipeline idles at $0.18/mo (S3 + Dynamo storage).
- Works on any SPA without code changes—just drop in
app.js. - Live demo navigates between tabs and fills forms with voice only.
- Added TTL pruning → no zombie WebSocket IDs, no manual clean-up.
What we learned
- Bedrock’s Claude-3 is shockingly good at structured JSON if you remind it every prompt.
- API Gateway WebSockets + Dynamo TTL = nearly effortless real-time fan-out.
- Small UX touches (on-screen log overlay, auto-WS reconnect) make demos rock-solid.
What’s next for VoiceNavAI
Transcribe Streaming to cut first-response time to <3 s.
Multilingual commands (update prompt + language-auto-detect).
ARIA role introspection: auto-generate selector whitelist per page.
Chrome extension wrapper to inject app.js on any site, no dev changes required.
Built With
- amazon-amplify
- amazon-dynamodb
- amazon-web-services
- amplify
- api
- apigateway
- aws-cdk
- bedrock
- css
- html
- iam
- javascript
- lambda
- python
- s3
- transcribe
Log in or sign up for Devpost to join the conversation.