Serenity: AI-Powered Therapeutic Avatar

Inspiration

The inspiration for Serenity came from witnessing the growing mental health crisis and the barriers people face in accessing therapy. Traditional therapy can be expensive, intimidating, and difficult to access. We wanted to create a bridge - a safe, always-available space where people could practice opening up about their struggles without fear of judgment. Seeing how people naturally anthropomorphize technology and find comfort in AI companionship, we envisioned creating a therapeutic avatar that could provide genuine emotional support with human-like warmth.

What it does

Serenity is an AI-powered therapeutic avatar that provides real-time conversational mental health support. Users can talk (or type) about their feelings, trauma, or daily struggles, and Dr. Elara or Dr. Theo respond with empathy, contextual understanding, and appropriate emotional expressions.

Emotionally-responsive avatars that show facial expressions matching the conversation (sadness, empathy, concern, joy)
Real-time voice synthesis with natural-sounding therapeutic voices
Context-aware memory that remembers names, relationships, and traumatic events
Multimodal interaction combining voice, text, and visual emotional feedback
Local AI processing for privacy-focused conversations
Therapeutic dialogue patterns based on evidence-based counseling techniques

How I built it

Frontend: Built a React frontend with Vite for fast development, using React Hooks and Context for state management and React Router DOM for navigation.
Backend: Created a Node.js and Express backend utilizing WebSockets for real-time communication and REST endpoints for TTS and emotion processing.
Voice Synthesis: Integrated Speechmatics TTS Preview API featuring diverse voices like Sarah, Theo, Megan, and Jack in WAV audio streaming format.
AI Intelligence: Set up Ollama for local LLM inference (Llama 3.1, DeepSeek, Mistral) with a custom emotion detection engine covering 15+ states and a trauma-aware memory system.
Immersion System: Developed a video avatar system with emotion-triggered playback, gender-specific environmental sounds, and custom CSS animations.

Challenges I ran into

Real-time Emotion Synchronization: Getting avatar expressions, voice synthesis, and emotional responses perfectly synchronized so the timing felt natural rather than robotic.
Audio Pipeline Issues: Managing browser audio limitations and blob URLs to maintain smooth playback across emotional states without memory leaks.
Context Memory Management: Building a working memory system that persisted key details and traumatic context without overwhelming the local LLM's window.
Local LLM Performance: Optimizing prompts and response handling to maintain a responsive conversation while running heavy models locally.
API Integration: Navigating limited documentation and inconsistencies in preview APIs to create robust fallback implementations.

Accomplishments that I'm proud of

Genuinely Empathetic AI: Creating a system that remembers, follows up, and demonstrates high emotional intelligence, making users feel truly "heard."
Privacy-First Architecture: Ensuring sensitive mental health conversations stay on the user's device by utilizing local LLMs instead of cloud-based processing.
Perfect Synchronization: Achieving seamless coordination between vocal tone, facial expressions, and therapeutic responses.
Scalable Avatar System: Building an extensible architecture where new personas and therapeutic approaches can be added as modular plugins.
Real Therapeutic Value: Observing users open up about difficult topics they hadn't shared previously, proving the tool's effectiveness.

What I learned

Emotional AI Complexity: Learned that empathy requires precise understanding of pacing, pauses, and the micro-transitions between facial expressions.
Therapeutic Dialogue: Gained knowledge in Carl Rogers' person-centered therapy and trauma-informed care to inform the AI's dialogue logic.
Real-time Architecture: Developed expertise in managing complex asynchronous processes like simultaneous voice synthesis and emotion detection.
HCI Psychology: Discovered how subtle details like eye contact timing and vocal pacing significantly impact a user's sense of connection.
Ethics in AI: Established a framework for responsible AI therapy, including crisis handling and knowing when to recommend human intervention.

What's next for Serenity

Persona Expansion: Adding specialized avatars for child psychology, grief counseling, and career coaching.
Progress Tracking: Implementing mood visualization and guided wellness modules for meditation and breathing.
Cloud Sync & Mobile: Developing companion apps with secure, private cloud synchronization across devices.
Clinical Integration: Working with professional therapists to use Serenity as a supplementary tool for between-session support.
Advanced Biofeedback: Integrating real-time heart rate and voice stress analysis to further refine the AI's emotional response accuracy.
Immersive Environments: Moving toward VR/AR integration to provide spatial audio and fully immersive therapeutic nature scenes.

Serenity represents just the beginning of how AI can support mental wellness - not as a replacement for human connection, but as a bridge to it, available anytime someone needs to be heard.

Built With

api
concurrently
css
dotenv
eslint
express.js
html5
node-fetch
node.js
ollama
react
sdk
vite
websocket

Serenity - AI-Powered Therapeutic Avatar Experience