MindfulAI - Our Hackathon Journey

Inspiration

In the race to build smarter AI, we've forgotten something fundamental: Emotional Intelligence.

Modern AI systems optimize for accuracy, speed, and capability - but they speak in the same flat, monotone voice whether you're celebrating a promotion or grieving a loss. They analyze your words but miss the tremor in your voice. They generate responses but don't feel the weight of what you're sharing.

The AI revolution neglected the Emotional Quotient (EQ).

We asked ourselves: What if AI could actually listen - not just to words, but to tone? What if it could respond not just with the right information, but with the right feeling? What if talking to AI felt less like querying a database and more like confiding in a friend?

The Gap We Saw

Traditional AI What Humans Need
Flat, consistent voice Tone that adapts to emotion
Text analysis only Voice tone recognition
Same response style always Empathetic variation
"I understand" (but doesn't) Actually demonstrates understanding

Our Novel Approach

MindfulAI bridges this gap with two key innovations:

  1. Emotion-Adaptive Voice Output - When you're anxious, MindfulAI speaks slower, softer, calmer. When you're excited, it matches your energy. The AI doesn't just say it understands - it shows it through how it speaks.

  2. User Tone Detection - Beyond analyzing what you say, we analyze how you say it. The intensity in your voice, the hesitation, the relief - these signals inform the AI's understanding and response.

This isn't just another chatbot with a voice skin. This is AI that finally has emotional intelligence.

What We Learned

Technical Insights

1. Memory Decay Mathematics

One of our key innovations was implementing memory decay for emotional state tracking. Emotions shouldn't persist at full intensity forever - they naturally fade:

$$w_{t+1} = w_t \cdot e^{-\lambda \Delta t}$$

Where:

  • $w_t$ = emotion weight at time $t$
  • $\lambda$ = decay constant (we used 0.15)
  • $\Delta t$ = time since last mention

This creates more natural conversations where the AI doesn't fixate on emotions mentioned 10 exchanges ago.

2. Emotion Blending

Instead of overwriting emotions, we blend them for continuity:

$$w_{new} = \max(w_{prev} \cdot 0.6 + I_{new} \cdot 0.4, I_{new} \cdot 0.8)$$

Where $I_{new}$ is the new intensity. This prevents jarring emotional jumps.

3. The Power of "Why NOT"

Explainability isn't just about why the AI did something - it's equally important to explain why it didn't do something else. Our technique selection now includes rejection reasons:

{
  "technique": "validation",
  "reason": "User needs to feel heard",
  "why_not": {
    "cognitive_reframing": "Too early - user not ready",
    "grounding_exercise": "No anxiety symptoms present"
  }
}

Human Insights

  • Voice changes everything - Text feels transactional; voice feels like connection
  • Transparency builds trust - Users engage more when they can see the AI's reasoning
  • Silence is powerful - Sometimes the best response is a pause, not more words

How We Built It

Architecture Overview

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   React + TS    │────▶│  FastAPI + WS   │────▶│   Gemini 2.5    │
│   (Frontend)    │◀────│   (Backend)     │◀────│   (Vertex AI)   │
└─────────────────┘     └────────┬────────┘     └─────────────────┘
                                 │
                    ┌────────────┼────────────┐
                    ▼            ▼            ▼
              ┌──────────┐ ┌──────────┐ ┌──────────┐
              │ Kafka    │ │ Postgres │ │ElevenLabs│
              │ Events   │ │ Sessions │ │  Voice   │
              └──────────┘ └──────────┘ └──────────┘

Tech Stack

Layer Technology Purpose
Frontend React 18 + TypeScript + Vite Voice-first UI with real-time updates
Backend FastAPI + WebSockets Async processing, streaming responses
AI Gemini 2.5 Flash (Vertex AI) Emotion analysis, technique selection, response generation
Voice ElevenLabs Emotion-adaptive voice synthesis
Events Confluent Kafka Observable AI cognition streaming
Database PostgreSQL Session persistence, audit trails

Key Components

1. Conversation State Machine

  • 6 phases: Opening → Exploration → Deepening → Technique → Integration → Closing
  • Automatic phase transitions based on user signals
  • Memory decay applied every exchange

2. Emotion-Adaptive Voice

  • Voice parameters adjust based on detected emotion
  • Anxious user → slower, calmer voice
  • Happy user → warmer, more energetic voice

3. Observable AI Cognition Dashboard

  • 14+ event types streamed via Kafka
  • Real-time visualization of AI decision-making
  • Full audit trail with correlation IDs

4. Safety System

  • Crisis detection with keyword + context analysis
  • Immediate resource provision (988, crisis lines)
  • Never provides medical advice

Challenges We Faced

Challenge 1: The "Sarcasm Problem"

Problem: Our breakthrough detection was triggering on sarcastic statements like "Oh sure, that makes total sense" (said dismissively).

Solution: We added emotion context requirements - breakthroughs only count when accompanied by positive emotions (relief, calm, gratitude):

positive_emotions = ["relief", "calm", "gratitude", "hopeful", "curious"]
if emotion not in positive_emotions:
    return None  # Not a real breakthrough

Challenge 2: The "Repetition Trap"

Problem: The AI would repeat similar responses, especially when users gave short acknowledgments like "yeah" or "okay".

Solution: We implemented multiple anti-repetition mechanisms:

  • Track used response patterns with fingerprinting
  • Track questions asked to avoid re-asking
  • Explicit anti-repetition guidance in prompts
  • Exercise discussion blocking after completion

Challenge 3: Phase Transition Counter Bug

Problem: Our phase transition events were reporting exchanges_in_previous_phase: 0 instead of the actual count.

Root Cause: The counter was reset before the event was emitted:

# BUG: Counter reset happens inside _check_phase_transition
self._check_phase_transition(memory, user_message)  # Resets counter to 0
emit_event(exchanges_in_previous_phase=memory.exchanges_in_phase)  # Always 0!

Solution: Store the count before calling the transition check:

exchanges_before = memory.exchanges_in_phase  # Save it first!
self._check_phase_transition(memory, user_message)
emit_event(exchanges_in_previous_phase=exchanges_before)  # Correct value

Challenge 4: Emotional "Jumpiness"

Problem: Emotions would suddenly shift from 0.8 anxiety to 0.3 calm in one exchange, feeling unnatural.

Solution: Implemented emotion blending instead of overwriting:

# Before: Jarring overwrites
memory.emotion_weights[emotion] = intensity

# After: Smooth blending
blended = max(previous * 0.6 + intensity * 0.4, intensity * 0.8)
memory.emotion_weights[emotion] = blended

Challenge 5: Kafka Event Explosion

Problem: We were emitting so many events that Kafka costs would scale poorly.

Solution: Strategic event emission - only emit when something meaningful changes, not every micro-update. Added reason fields to every event for human readability.

What Makes MindfulAI Different

Feature Traditional Chatbots MindfulAI
Interaction Text-only Voice-first
Transparency Black box Observable cognition
Memory Stateless or full history Decaying memory (natural)
Explainability None Why + Why NOT
Auditability Logs only Real-time Kafka stream
Emotion handling Simple sentiment Multi-emotion with intensity

Future Vision

  • Multi-language support via Gemini's multilingual capabilities
  • Therapist dashboard for professionals to review AI sessions
  • Personalized voice cloning for consistency across sessions
  • Federated learning for privacy-preserving model improvements
  • Integration with wearables for physiological context (heart rate, stress levels)

Built With

Share this project:

Updates