🌌 Inspiration: The Problem with "Helpful" AI

"This being human is a guest house. Every morning a new arrival." — Rumi

We noticed a glaring issue with modern AI: it is obsessively clinical and solution-oriented. When someone is grieving, anxious, or overwhelmed, they don’t want a 5-step bulleted list on how to fix their life; they just want to be heard.

Inspired by the ancient concept of Karuna (Compassion) and Daya (Empathy), we wanted to build an AI that doesn't look away from suffering, but actively chooses to "sit in the fire" with you.

🪞 What it does: The Mirror of Daya

Karuna AI is a purely voice-driven, bilingual companion. There is no chat interface to distract you—just a beautifully calm, breathing orb that reacts dynamically to your voice amplitude and emotional sentiment.

It maps your unresolved thoughts into a persistent "Dark Passage" constellation—a visual memory of your emotional journey. To maintain this empathetic alignment, Karuna operates on an attunement function, scoring emotional distance over time:

$$ S_{attunement} = \sum_{t=1}^{T} \left( \alpha \cdot E_{user}(t) - \beta \cdot | V_{model}(t) - V_{user}(t) | \right) $$

(Where \(E_{user}\) is emotional presence, and \(| V_{model} - V_{user} |\) minimizes the difference in vocal pacing and amplitude.)

🏗️ How we built it: Architecture

We utilized the groundbreaking Gemini 2.0 Flash Multimodal Live API via WebSockets to achieve true, sub-second native audio interactions without text-to-speech lag.

Component Technology Function
Brain Gemini 2.0 Flash Live API Native bidirectional audio processing.
Backend FastAPI (Python) High-throughput WebSocket server via google-adk.
Memory Google Cloud Firestore NoSQL storage for the "Constellation" mapping.
Frontend Three.js & Web Audio API Generative UI and Dhvani ambient drone rendering.
Hosting Render (Docker) Containerized deployment for the voice pipeline.

The Firestore Passage Tool To create the persistent memory, we implemented asynchronous tool calling within the Gemini pipeline:

@tool
def save_to_passage(uncertainty_text: str, theme: str):
    """Saves a core user uncertainty to the Dark Passage constellation."""
    db.collection("users").document(user_id).collection("passage").add({
        "thought": uncertainty_text,
        "theme": theme,
        "timestamp": firestore.SERVER_TIMESTAMP
    })
    return "Saved."

##🫁 UI & Audio Engineering
The visual centerpiece is the Karuna Orb. We abandoned standard loading spinners for a generative orb that breathes with your voice. The orb's pulse amplitude \(A(t)\) is dynamically driven by the incoming audio waveform's Root Mean Square (RMS) energy:

$$ A(t) = A_{resting} + k \sqrt{\frac{1}{N} \sum_{n=0}^{N-1} x[n]^2} $$

When the user is speaking, the RMS energy \(x[n]^2\) expands the orb, creating a physiological feeling that the AI is physically "breathing in" the user's words.

##🧗 Challenges we ran into
Deploying WebSockets: Deploying a true, persistent bidirectional WebSocket connection for native audio is incredibly challenging in serverless environments.
Library Patching: We had to dynamically patch the google-adk source code during the build sequence to force the Live API onto the v1beta endpoints to bypass strict quota limits.
Prompt Engineering: Tuning the system prompt to prevent the Gemini model from reverting to its standard "helpful assistant" persona required strict psychological guardrails and anti-patterns.
🚀 What's next for Karuna AI
 Expand the "Vardaan" (Generative poetic reflections) capability.
 Integrate deeper recognition for localized Hindi/Urdu dialects.
 Allow users to actively explore their "Constellation" memory map using spatial computing (WebXR).

Built With

Share this project:

Updates