Journal
Coach
Therapy
Dashboard
HomePage

Inspiration

The Story of Voco-Coach: Voice, Emotion, and AI

🌟 Inspiration

Communication is the most human thing we do, yet it is often the source of our greatest stress. Whether it’s a high-stakes meeting, a difficult conversation with a loved one, or managing social anxiety, we often lack a "safe space" to practice and receive objective feedback.

I was inspired to build Voco-Coach after realizing that while we have fitness trackers for our steps and heart rate, we lack a "fitness tracker" for our emotional and vocal health. I wanted to bridge the gap between AI-driven data and human-centric therapy, creating a tool that doesn’t just record what you say, but understands how you say it and how you feel while saying it.

🧠 What I Learned

Building Voco-Coach was a masterclass in Agentic AI and Affective Computing.

Context is King: I learned that LLMs like Google Gemini 2.0 Flash are incredibly capable of detecting nuance, but they require structured data (biomarkers) to provide truly "empathetic" feedback.
The Power of Socratic Feedback: Instead of the AI telling the user what to do, I learned that the most effective growth happens when the AI asks the right questions. This led to the development of the Socratic Journaling feature.
Real-time Challenges: Handling live audio streams in a browser environment while maintaining a "Glassmorphism" UI taught me a lot about performance optimization in Next.js.

🛠️ How I Built It

Voco-Coach is built on a modern, high-performance stack designed for low latency and high emotional resonance.

The Technical Architecture

The Intelligence Layer: I utilized Google Gemini 2.0 Flash for its speed and deep reasoning. It powers the "Calm Scores" and the Socratic questioning logic.
The Vocal Layer: To make the AI feel human, I integrated ElevenLabs API, allowing the platform to speak back to the user with natural, emotionally expressive voices.
The Data Layer: Using SQLite with Prisma ORM, I created a complex schema to track everything from 7-day stress trends to therapist-student chat histories.
The Interface: I chose a Glassmorphism design (Tailwind CSS + Framer Motion) to evoke a sense of clarity and calm, essential for a therapy-focused application.

The Mathematical Core

To calculate vocal health and stress, the application uses weighted averages of pitch variation () and clarity (). A simplified version of our Calm Score () can be expressed as:

Where:

represents the pitch variance over time.
is the voice clarity percentage.
are weights adjusted based on the user's historical baseline.

🚧 Challenges Faced

The journey wasn't without its hurdles:

Latency vs. Realism: Getting the AI to respond fast enough to feel like a "live" conversation while still processing complex emotional biomarkers was difficult. I solved this by using the Gemini 2.0 Flash model, which offers significantly lower latency for real-time applications.
Role-Based Security: Designing a system where a single user could potentially be an Admin, a Therapist, and a Student required a robust JWT-based RBAC (Role-Based Access Control) system. Ensuring that data remained siloed and secure was a top priority.
Tone Analysis Accuracy: Traditional sentiment analysis often misses the "vibe." I had to refine the prompts to ensure the AI looked for cognitive distortions in text rather than just "sad" or "happy" keywords.

🚀 The Future

Voco-Coach is just the beginning. I am looking toward a future where we can integrate wearable data to correlate vocal biomarkers with physiological stress (HRV).

Voco-Coach isn't just an app; it’s a companion for anyone looking to find their voice and master their emotions in an increasingly digital world.

Built With

Submitted to

AI Partner Catalyst: Accelerate Innovation

Created by

I worked on the initial front end.

Iqra Muhammad
As the main AI and backend engineer for Voca-Coach, I architected and implemented a comprehensive AI-powered voice coaching and mental health platform. My work included designing and building 78+ API endpoints across authentication, real-time chat, therapy sessions, and payment processing systems. I integrated Google Gemini 2.0 Flash for voice biomarker analysis (measuring pitch, clarity, stress, jitter, shimmer, and HNR), AI therapy conversations with 4 distinct therapeutic personas, crisis detection with multi-level risk assessment, and personalized journal insights. I also implemented ElevenLabs TTS for natural voice responses, Stripe for session payments with therapist payouts, and a real-time Socket.io system for therapist-patient messaging with presence tracking, typing indicators, and sentiment analysis. Additionally, I built a gamified de-escalation training system with scenario-based practice and biomarker monitoring, all backed by a Prisma/SQLite database with 25+ models
supporting the platform's complex data relationships. And I worked on the final frontend design.

Abdullah Ramzan
Caffeine-fueled tech maestro, equally at home, building intelligent AI, machine learning models and crafting seamless web applications.
abdullah bin aqeel
hasnaat malik

Updates

Iqra Muhammad started this project — Dec 31, 2025 09:37 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.