Inspiration

My grandmother lived alone in her apartment in a mid-sized Czech city until she was 87. She'd forget her blood pressure medication, miss doctor appointments, and once got lost walking home from the pharmacy — just three blocks away. She could barely see her phone screen, and touchscreen apps were completely unusable for her.

I kept thinking: why can't she just talk to her phone and have it take care of everything? Not type. Not tap tiny buttons. Just speak naturally, like talking to a caring friend.

When Amazon Nova 2 Sonic launched with real-time speech-to-speech and tool calling capabilities, I realized this was finally possible — a voice assistant that doesn't just answer questions, but proactively cares: reminds you about medications, tells you where you are when you're disoriented, reads your medication labels through the camera, and greets you every morning with your schedule and weather.

Sonic2Life is the assistant I wish my grandmother had.

What it does

Sonic2Life is a 100% voice-first Progressive Web App designed for elderly people and visually impaired users. The entire interface is a single large button — press it and talk. The AI handles everything else.

Core capabilities

  • Natural voice conversation — real-time speech-to-speech via Amazon Nova 2 Sonic with dynamic language detection (speaks whatever language the user speaks). Warm, patient persona that never rushes and always repeats when asked
  • Medication management — tracks medication schedules, sends proactive push reminders ("Time for Warfarin!"), confirms intake via voice or notification buttons ("Taken" / "Snooze 15min"), and logs compliance history in SQLite
  • Camera & photo analysis — tap the camera button, take a photo, and the AI automatically describes what it sees using Amazon Nova 2 Lite vision. Reads medication labels, identifies objects, describes surroundings. Works on mobile with a "Photo First" flow — take the photo first, then start talking
  • Location awareness — uses GPS + Amazon Location Service (MCP) to answer "Where am I?", find nearby pharmacies and shops, provide walking directions, and optimize multi-stop routes
  • Calendar & events — manages appointments via voice (add, update, cancel), sends morning briefings with the day's schedule, and pre-event reminders
  • Emergency contacts & SMS — stores family and caregiver contacts with relationships. Say "text my son that I need help" and the AI finds the right contact by relationship and sends SMS via Amazon SNS
  • Weather-aware advice — checks current conditions and 3-day forecast via Open-Meteo, suggests appropriate clothing and outdoor activity safety for seniors
  • Web search — voice-triggered web search via DuckDuckGo for news, general knowledge, and health information
  • Persistent memory — remembers user preferences, names, habits, and important information across sessions
  • Proactive notifications — a background scheduler monitors medication times and upcoming events, sending push notifications even when the app is closed, with actionable buttons that feed responses back to the server
  • User profile personalization — admin sets the user's name, and the assistant greets them personally. Language is auto-detected from the user's name and conversation
  • Admin panel — full web dashboard for caregivers (family members) to manage medications, events, contacts, push subscriptions, notification history, and settings — without needing voice commands

How I built it

The architecture uses a dual-model approach with a clever single-tool interface pattern:

  1. Amazon Nova 2 Sonic (Bedrock) handles all voice interaction — bidirectional speech-to-speech streaming with built-in VAD and barge-in support. It sees only ONE tool: askAgent. This dramatically improved tool-calling accuracy compared to exposing all tools directly.

  2. Behind askAgent, a Strands Agent powered by Amazon Nova 2 Lite orchestrates 29+ specialized tools — medications (5 tools), calendar events (5 tools), memory (3 tools), emergency contacts (4 tools), SMS, weather, web search, photo analysis, and two MCP servers (Amazon Location Service + AWS Knowledge Base).

  3. Amazon Nova 2 Lite also powers the vision pipeline — when the user takes a photo, it's automatically analyzed via the Bedrock Converse API and the description is injected into the active voice session. No voice command needed.

  4. Amazon SNS delivers emergency SMS to family contacts. The agent resolves contacts by relationship ("my son", "my doctor") and sends personalized messages.

  5. Frontend is a vanilla JS PWA with continuous audio streaming via Web Audio API (AudioWorklet ring buffer). GPS coordinates flow via WebSocket and are automatically injected into every agent call — the AI always knows where the user is.

  6. "Photo First" mobile flow — on mobile, taking a photo suspends the browser and kills the WebSocket. Photos are queued as pendingPhoto and auto-sent when the voice session reconnects, then auto-analyzed and spoken aloud after the greeting.

  7. Push notification system uses VAPID/WebPush with dual delivery: SSE for instant in-app banners + Web Push for background system notifications. Medication notifications include "Taken" and "Snooze 15min" buttons with a feedback loop — responses are persisted in SQLite and the scheduler respects active snoozes.

  8. Background scheduler (asyncio) periodically checks medication times and upcoming events, sending proactive notifications. Three failed deliveries auto-remove stale subscriptions.

  9. Cookie-based authentication replaced nginx Basic Auth (which broke PWA/Service Worker). Public paths for static assets enable proper SW caching and PWA installation.

  10. SQLite stores all persistent data — medications, events, memory, contacts, medication logs, notification responses, push subscriptions, VAPID keys, and user settings.

  11. The backend is Python 3.12 with FastAPI, deployed via Docker.

AWS Services used

Service Purpose
Amazon Nova 2 Sonic (Bedrock) Real-time speech-to-speech voice conversation
Amazon Nova 2 Lite (Bedrock) Strands Agent reasoning + camera photo analysis (vision)
Amazon Location Service (MCP) Geocoding, POI search, routing, directions
AWS Knowledge Base (MCP) AWS documentation search for agent context
Amazon SNS Emergency SMS delivery to family contacts

Challenges I ran into

  1. py_vapid 1.9.4 broke everything. The Vapid02 class removed public_key_urlsafe_base64() and private_pem() returned PEM format that pywebpush 2.3.0 couldn't deserialize (ASN.1 parsing errors). We had to manually extract raw EC keys using the cryptography library — 32-byte private scalar and uncompressed EC point for the public key, both as urlsafe base64.

  2. pywebpush mutates vapid_claims. When sending to multiple push subscribers, pywebpush adds an aud field from the first subscriber's endpoint into the shared claims dict. The second subscriber (different push service) gets the wrong audience. Fix: always pass vapid_claims.copy().

  3. Audio streaming is counterintuitive. Our instinct was to gate audio sending based on UI state (only send when "listening"). But Nova Sonic requires continuous audio streaming — it handles VAD server-side. Stopping audio broke the entire conversation flow. The frontend state machine is purely visual.

  4. MCP config is documentation, not configuration. We spent hours wondering why Amazon Location MCP wasn't working despite being in mcp_config.json. Turns out that file is just documentation — actual MCP connections require explicit MCPClient instances in Python code.

  5. Push notifications silently fail without HTTPS. On non-localhost addresses, pushManager.subscribe() doesn't throw an error — it just silently does nothing. Debugging this took longer than implementing the entire push system.

  6. Nginx Basic Auth kills PWA. Service Workers can't cache assets behind HTTP Basic Auth — cache.addAll() fetches without credentials, fails silently, and beforeinstallprompt never fires. We tried credentials: 'include' in the SW, custom nginx locations with auth_basic off, but Nginx Proxy Manager's Access Lists use satisfy all which overrides everything. Final fix: moved auth entirely into FastAPI with session cookies.

  7. Mobile camera suspends the browser. When the user taps the camera button on Android, the native camera app opens and Chrome goes to background — killing the WebSocket and audio session. We implemented a "Photo First" flow: the photo is stored as pendingPhoto, and when the user taps "Talk" again, it's automatically sent and analyzed after the greeting plays.

  8. SNS sandbox is per-region. We verified a phone number in eu-central-1 (Frankfurt) but the app runs in eu-north-1 (Stockholm). SMS silently succeeded (returned sns_message_id) but never arrived. Took embarrassingly long to realize sandbox phone numbers don't transfer between regions.

Accomplishments that I'm proud of

  • The single-tool interface pattern — Nova Sonic sees only askAgent, while 29+ tools work behind the scenes. This dramatically improved the voice model's tool-calling accuracy compared to exposing all tools directly.

  • 6 AWS services integrated into a cohesive voice experience — Nova 2 Sonic, Nova 2 Lite, Amazon Location Service, AWS Knowledge Base, Amazon SNS, and Bedrock runtime.

  • Truly proactive care — the assistant doesn't wait to be asked. It sends medication reminders, morning briefings, and event alerts via push notifications. When the user taps a notification, the app opens and the AI speaks first.

  • Actionable notifications with feedback loop — push notifications have dynamic buttons ("Taken" / "Snooze 15min") that persist responses in SQLite. The scheduler respects active snoozes and re-sends reminders when they expire. Three response sources: in-app banner, system notification click, and voice confirmation.

  • Automatic photo analysis — take a photo of your medication and the AI reads the label and tells you what it is, without any voice command needed. The "Photo First" flow handles the mobile browser suspension gracefully.

  • Emergency SMS with relationship matching — say "text my son that I need help" and the system finds the right contact by the relationship field, composes a message, and sends it via Amazon SNS.

  • It actually works on a real phone — installable PWA on Android Chrome with push notifications, GPS tracking, camera, and offline caching. One big button. My 70-year-old neighbor tested it and said: "This is the first app I can actually use."

  • Complete admin panel — caregivers (family members) can manage everything through a web dashboard: medications, events, contacts, push subscriptions, notification history, and user profile — without needing to use voice commands.

What I learned

  • Speech-to-speech changes everything. There's no intermediate text — the model hears emotion, hesitation, confusion in the voice and responds accordingly. This is fundamentally different from STT→LLM→TTS pipelines.

  • The single-tool pattern is a game changer. Giving a voice model 29 tools leads to confusion and hallucinated tool calls. Giving it ONE tool (askAgent) and letting a reasoning model orchestrate behind the scenes produces dramatically better results.

  • System prompts for voice AI need to be explicit. Nova Sonic won't call tools unless the system prompt contains explicit rules like "You MUST use askAgent when the user asks about location/weather/medications." Subtle hints don't work.

  • Accessibility is a design constraint, not a feature. When your users can't see the screen, every design decision changes. We removed all visual dependencies and made the UI work with zero visual attention.

  • Push notification ecosystem is fragile. Between VAPID key formats, browser compatibility (Firefox Android dropped Web Push entirely), HTTPS requirements, and subscription lifecycle management — getting push notifications working reliably across devices was the hardest part of the project.

  • MCP servers are powerful but underdocumented. Amazon Location Service via MCP gives you geocoding, routing, and POI search with minimal code — but the integration patterns aren't well documented yet.

  • Auth and PWA don't mix easily. Any form of HTTP-level authentication (Basic Auth, nginx access lists) breaks Service Worker caching and PWA installation. Application-level auth with cookies is the only reliable approach.

What's next for Sonic2Life

  • Amazon Bedrock Guardrails — filter dangerous health advice before the assistant speaks it aloud
  • Amazon Comprehend Medical — automatically extract medication names, dosages, and conditions from natural conversation
  • Health vitals logging — track blood pressure, blood sugar, and weight via voice with trend analysis
  • Shopping list — "Add milk to my shopping list" with push notification when near a store (using existing GPS + Location Service)
  • Amazon Cognito — multi-user support for care facilities with per-user profiles
  • Caregiver reports via Amazon SES — daily health and compliance summaries emailed to family members
  • Multi-language expansion — currently auto-detects Czech and English; adding German, Spanish, and other languages for broader accessibility

Built With

Share this project:

Updates