Naada — AI Sound Therapy Companion

Architecture

Inspiration

I've seen people around me — friends, family, colleagues — struggle with stress, anxiety, and sleeplessness. Therapy is expensive. Meditation apps feel generic. And sound healing, which has centuries of practice in Indian tradition (ragas, singing bowls, mantras), requires trained practitioners most people will never meet.

I thought: what if AI could be that practitioner? Not a chatbot that sends you a playlist, but something that actually sees your face, hears your voice, and responds with the exact sounds your mind needs — right now, in real-time?

That's how Naada was born. The name comes from Sanskrit "नाद" — the concept of cosmic vibration, the idea that sound is the fundamental fabric of healing.

What it does

Naada is a real-time multimodal AI sound therapy companion. You open a browser, allow camera and mic, and just start talking.

Sees you — camera reads your facial expressions every 3 seconds to detect stress, sadness, anxiety, or calm
Hears you — voice streams continuously as 16kHz PCM audio for tone and word analysis
Talks to you — a warm AI companion (powered by Gemini 2.5 Flash) responds with voice, not text
Heals you — plays scientifically-tuned therapy sounds adapted to your detected emotional state

It supports 17 therapy sound types including Tibetan singing bowls, binaural beats, delta waves, solfeggio frequencies, and clinical protocols for ADHD, PTSD, tinnitus, Parkinson's, and stuttering.

The standout feature: a live generative Indian classical music composer that creates unique ragas in real-time using Karplus-Strong string synthesis. No two sessions ever sound the same.

Other features: guided meditation (5 styles), sound mixing pad, real-time wellness scoring, affirmation cards, session insights, mood heatmap calendar, chakra energy map, delayed auditory feedback for stuttering, neurotone voice analysis, Spotify integration, PWA support, and 30+ languages.

How we built it

Backend: Python 3.11 + FastAPI with a single WebSocket endpoint. The Google ADK Runner handles bidirectional streaming between the browser and Gemini Live API. 13 modular Python files, 16 agent tools organized into 4 domains (mood, therapy, meditation, spotify). Tool calls from Gemini are intercepted by an event processor with a dispatch table and translated into typed JSON messages for the frontend.

Frontend: Vanilla JavaScript with a mixin pattern. The main NaadaApp class is extended by TherapyController, SessionTracker, and UIEffects mixins. Two separate Web Audio engines — one for file-based sounds and synthesized clinical protocols, another for real-time algorithmic raga composition using Karplus-Strong string synthesis with proper aroha/avaroha scales, 7 instrument models, and 5 talas.

Streaming: One WebSocket carries everything — camera frames (JPEG base64), mic audio (PCM binary), agent voice (24kHz PCM), tool call results (JSON), and transcripts. Audio ducking ensures the agent voice is always clear over therapy sounds.

Deployment: Docker container on Google Cloud Run with auto-scaling (0-10 instances), session affinity for WebSocket persistence, and Cloud Build CI/CD triggered by git push.

Challenges we ran into

Gemini Live API disconnect handling — When a user closes the browser, Gemini sends a WebSocket close code 1000, which the ADK library throws as an APIError with a full traceback. It's actually a clean disconnect, not an error. Took multiple iterations to properly suppress these at both the logging and stderr level.
Audio ducking timing — Getting therapy sounds to duck smoothly when the agent speaks (and restore when it stops) without audio glitches. Settled on 8% volume during speech, 30% during silence, with 3-second duck on user interruption.
Generative raga authenticity — Making algorithmically generated Indian classical music sound musically correct. Had to implement proper raga grammar (aroha/avaroha), note durations following tala cycles, and tanpura drone tuning.
GCP deployment permissions — gcr.io had permission issues; had to switch to Artifact Registry and manually grant IAM roles to Cloud Build service accounts.

Accomplishments that we're proud of

True multimodal real-time interaction — camera + voice + AI response + sound therapy all flowing through one WebSocket simultaneously
17 therapy types with clinical backing — every sound maps to published research (Chaieb et al. for binaural beats, Goldsby et al. for Tibetan bowls, MIT Tsai Lab for 40Hz gamma stimulation)
Live generative ragas — real Indian classical music composed note-by-note in the browser, not pre-recorded loops
Zero-install access — works in any browser, installable as PWA, no app store needed
Clean modular architecture — 13 focused backend modules, dispatch table for tool calls, mixin pattern for frontend

What we learned

Gemini's Live API with native audio is incredibly powerful for building conversational agents that feel natural — the interruption handling alone makes it feel like talking to a real person
Google ADK's bidirectional streaming mode simplifies what would otherwise be extremely complex WebSocket orchestration
Sound therapy is a genuinely underserved space where AI can make a real difference — the intersection of ancient healing practices and modern AI is rich with possibility
Building for real-time requires thinking about every millisecond — audio latency, frame timing, duck/restore transitions all matter for the experience to feel seamless

What's next for Naada — AI Sound Therapy Companion

Persistent session history — save mood journeys across sessions with Cloud Firestore
Therapist dashboard — let licensed therapists monitor patient sessions and customize protocols
Wearable integration — connect to Apple Watch / Fitbit for real heart rate and HRV data instead of rPPG estimation
Group therapy rooms — multiple users in one session with shared soundscapes
Research mode — export anonymized session data for clinical sound therapy studies
Mobile native — Flutter app with background audio for therapy that continues outside the browser