Inspiration

Mental health decline is often a silent trajectory. When individuals experience prolonged periods of stress or depression, their primary support networks—therapists, partners, and close friends—are completely disconnected from their day-to-day emotional state.

We realized that traditional journaling causes high cognitive friction (leading to low adherence), and standard mood-tracking apps rely on subjective 1-to-10 sliders that fail to capture a person's true, subconscious emotional exhaustion. People can lie to a mood slider, but they cannot lie to their vocal cords.

This inspired us to build InnerVoice: an ecosystem that completely removes the friction of self-reflection and automatically bridges the gap between the patient and their guardians before a crisis occurs.

What it does

InnerVoice is an AI-driven Emotional Wellness Tracker. Users simply speak their mind for 60 seconds into their device.

Instead of just parsing text, InnerVoice listens. It utilizes advanced, localized ML models to extract acoustic features (pitch, speech rate, energetic variance, conversational pauses) to uncover latent emotions behind the words.

Additionally, InnerVoice breaks the isolation of mental health struggles through its Trusted Circle architecture. The platform automatically broadcasts Weekly Emotional Trend Reports (synthesized by an LLM) to a pre-authenticated support system to proactively facilitate early human intervention.

How we built it

We developed InnerVoice as a robust full-stack application designed for speed, privacy, and accuracy:

  • Frontend: Built with Next.js 14, React context, and TailwindCSS. We used Framer Motion for micro-animations and Recharts for rendering longitudinal emotional trend data in an intuitive dashboard.
  • Backend: Powered by FastAPI with a Python 3.10+ environment using SQLAlchemy and SQLite for secure, localized data handling.
  • Acoustic ML Pipeline: We utilized librosa to extract physiological distress markers (RMS energy, zero-crossing rates) and processed the audio in-memory through local HuggingFace Models (wav2vec2-lg-xlsr-en-speech-emotion-recognition).
  • Transcription & Synthesis: OpenAI's Whisper model handles immediate audio-to-text transcription, while an LLM synthesizes the complex data into digestible weekly trends.
  • Trusted Circle API: We integrated the Brevo Transactional Email REST API via httpx to trigger automated alerts to the user's secure network.
  • Deployment: The entire stack is containerized in a unified Dockerfile perfectly optimized for deployment on Hugging Face Spaces.

Challenges we ran into

  • Acoustic Overload: Handling raw audio feature extraction in real-time is notoriously heavy. We initially ran into persistent Out-Of-Memory (OOM) socket hang-ups. We conquered this by optimizing the pipeline to process chunks in-memory and immediately garbage-collect the audio, storing only the quantified acoustic metrics.
  • Subjectivity of Emotion: Emotion is subjective. We had to calibrate our Hugging Face models carefully to understand the difference between healthy pauses in speech and pauses caused by emotional exhaustion or anxiety.
  • Privacy: Audio data is incredibly sensitive. We engineered the architecture so that raw audio is never persisted to the database. It exists only temporarily in the backend ML pipeline and is securely wiped once the float data and text string are extracted.

Accomplishments that we're proud of

  • Zero-Friction UI: We successfully built a journaling experience that requires absolutely no typing—just a single button press.
  • Deep Acoustic Insights: Achieving a high degree of correlation between the user's vocal energy (RMS) and their tracked emotional timeline.
  • The Trusted Circle: Creating a seamless bridge that translates complicated, multi-day psychological AI data into a readable, actionable email for a therapist or loved one.

What we learned

  • We learned the profound limitations of purely semantic LLMs. Words can be deceiving, but acoustics rarely lie. Diving into librosa and wav2vec2 taught us how much human biology is physically altered by stress.
  • We learned how to orchestrate a complex, multi-service backend within a single Dockerized Hugging Face space for seamless deployment.

What's next for InnerVoice: The algorithm that listens

Moving forward, we want to expand the AI's contextual awareness. We plan to integrate wearable API data (like Apple Health or Garmin) to map sleep deprivation, heart-rate variability, and activity levels alongside the vocal acoustic data.

Ultimately, we want InnerVoice to become standard integration for tele-health providers, allowing therapists to monitor non-critical patients passively and actively intervene the moment the acoustic data predicts a downward spiral.

Built With

Share this project:

Updates