Karuna AI: The Bilingual Empathetic Voice Companion

A minimal, distraction-free breathing orb. No text, no chat—just presence.
Seamlessly responsive on iOS and Android for continuous, pocketable empathy.
True sub-second audio streaming via FastAPI, WebSockets, and Gemini 2.0 Flash.
Your emotional journey mapped as a persistent, visual constellation of memory.
A stunning dark-mode desktop view utilizing Three.js for generative particle physics.
The visual centerpiece. It dynamically expands and contracts to the RMS amplitude of your voice.
Visualizing the real-time, native audio connection bypassing slow text-to-speech pipelines.
Built on the ancient philosophy of reflecting human truths rather than offering clinical solutions.

🌌 Inspiration: The Problem with "Helpful" AI

"This being human is a guest house. Every morning a new arrival." — Rumi

We noticed a glaring issue with modern AI: it is obsessively clinical and solution-oriented. When someone is grieving, anxious, or overwhelmed, they don’t want a 5-step bulleted list on how to fix their life; they just want to be heard.

Inspired by the ancient concept of Karuna (Compassion) and Daya (Empathy), we wanted to build an AI that doesn't look away from suffering, but actively chooses to "sit in the fire" with you.

🪞 What it does: The Mirror of Daya

Karuna AI is a purely voice-driven, bilingual companion. There is no chat interface to distract you—just a beautifully calm, breathing orb that reacts dynamically to your voice amplitude and emotional sentiment.

It maps your unresolved thoughts into a persistent "Dark Passage" constellation—a visual memory of your emotional journey. To maintain this empathetic alignment, Karuna operates on an attunement function, scoring emotional distance over time:

$$ S_{attunement} = \sum_{t=1}^{T} \left( \alpha \cdot E_{user}(t) - \beta \cdot | V_{model}(t) - V_{user}(t) | \right) $$

(Where $E_{user}$ is emotional presence, and $| V_{model} - V_{user} |$ minimizes the difference in vocal pacing and amplitude.)

🏗️ How we built it: Architecture

We utilized the groundbreaking Gemini 2.0 Flash Multimodal Live API via WebSockets to achieve true, sub-second native audio interactions without text-to-speech lag.

Component	Technology	Function
Brain	Gemini 2.0 Flash Live API	Native bidirectional audio processing.
Backend	FastAPI (Python)	High-throughput WebSocket server via `google-adk`.
Memory	Google Cloud Firestore	NoSQL storage for the "Constellation" mapping.
Frontend	Three.js & Web Audio API	Generative UI and Dhvani ambient drone rendering.
Hosting	Render (Docker)	Containerized deployment for the voice pipeline.

The Firestore Passage Tool To create the persistent memory, we implemented asynchronous tool calling within the Gemini pipeline:

@tool
def save_to_passage(uncertainty_text: str, theme: str):
    """Saves a core user uncertainty to the Dark Passage constellation."""
    db.collection("users").document(user_id).collection("passage").add({
        "thought": uncertainty_text,
        "theme": theme,
        "timestamp": firestore.SERVER_TIMESTAMP
    })
    return "Saved."

##🫁 UI & Audio Engineering
The visual centerpiece is the Karuna Orb. We abandoned standard loading spinners for a generative orb that breathes with your voice. The orb's pulse amplitude \(A(t)\) is dynamically driven by the incoming audio waveform's Root Mean Square (RMS) energy:

$$ A(t) = A_{resting} + k \sqrt{\frac{1}{N} \sum_{n=0}^{N-1} x[n]^2} $$

When the user is speaking, the RMS energy \(x[n]^2\) expands the orb, creating a physiological feeling that the AI is physically "breathing in" the user's words.

##🧗 Challenges we ran into
Deploying WebSockets: Deploying a true, persistent bidirectional WebSocket connection for native audio is incredibly challenging in serverless environments.
Library Patching: We had to dynamically patch the google-adk source code during the build sequence to force the Live API onto the v1beta endpoints to bypass strict quota limits.
Prompt Engineering: Tuning the system prompt to prevent the Gemini model from reverting to its standard "helpful assistant" persona required strict psychological guardrails and anti-patterns.
🚀 What's next for Karuna AI
 Expand the "Vardaan" (Generative poetic reflections) capability.
 Integrate deeper recognition for localized Hindi/Urdu dialects.
 Allow users to actively explore their "Constellation" memory map using spatial computing (WebXR).

Built With

docker
fastapi
gemini-2.0-flash
google-cloud-firestore
google-gemini
javascript
python
render
three.js
uvicorn
web-audio-api
websockets

Updates

Shyam Sharma started this project — Apr 25, 2026 06:03 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.