MindGuardian AI: A Voice-First Mental Health Companion

💡 Inspiration

Mental health crises rarely announce themselves loudly. More often, they build quietly — in sleepless nights, in the slight flattening of someone's voice, in the gradual retreat from things they once loved. I've watched people close to me struggle to articulate what they were feeling, not because they lacked the words, but because typing them out felt clinical, cold, and impossibly hard in the moment.

I wanted to build something different: a companion that listens — not just to what you say, but how you say it.

The idea for MindGuardian AI came from a simple belief: the most natural interface for emotional expression is the human voice, and yet nearly every mental health tool on the market buries users in forms, sliders, and text boxes. What if the act of checking in with yourself felt as natural as talking to a trusted friend?


🔨 How I Built It

MindGuardian AI is a React + TypeScript + Vite + Tailwind CSS progressive web app powered by Groq's Llama 3.3 70B Versatile model for fast, empathetic conversational AI.

Voice Pipeline

The voice agent sits at the heart of the experience:

  1. Web Speech API handles speech-to-text (STT) and text-to-speech (TTS) — no third-party service required, fully on-device for privacy.
  2. Web Audio API captures numerical biomarkers from the audio stream in real-time: energy levels, pitch variability, speech rate, and pause frequency.
  3. A live waveform is rendered frame-by-frame on an HTML Canvas, giving the user visual feedback that they're being heard.
  4. Once a session ends, the raw audio stream is immediately discarded. Only the derived numerical biomarkers are kept.

AI Layer

All LLM calls hit the Groq /api/chat proxy (never exposing the key to the browser) and return structured JSON responses. The system prompt instructs the model to:

  • Detect 5 cognitive distortion patterns (catastrophizing, all-or-nothing thinking, mind reading, emotional reasoning, should statements)
  • Generate empathetic, non-clinical responses calibrated to emotional state
  • Escalate appropriately through a 3-tier system: gentle nudge → trusted-person prompt → full crisis resources

The model's output drives both the conversation UI and FHIR Observation records simultaneously.

Health Data: FHIR R4

Rather than a custom schema, all mental health data is stored using the HL7 FHIR R4 standard. This was a deliberate architectural choice — it means the data is interoperable, exportable, and follows conventions familiar to healthcare systems.

Resources used: | FHIR Resource | Purpose | |---|---| | Patient | User identity and demographics | | Observation | Mood, sleep, stress (LOINC-coded) + voice biomarkers | | Communication | Session transcripts | | Condition | Detected cognitive distortion patterns | | CarePlan | Personalized intervention suggestions |

Everything lives in localStorage["mindguardian_fhir_store"] — no server, no cloud, no account required.

Behavioral Drift Engine

Tracking a single snapshot of someone's mood is nearly useless. What matters is change over time. The drift engine computes:

$$\text{DriftScore} = \sum_{i} w_i \cdot \Delta_i$$

where $\Delta_i$ is the deviation of each metric from the personal baseline, and $w_i$ is a domain-specific weight (e.g., sleep quality is weighted more heavily than energy). When DriftScore exceeds a threshold, the burnout trajectory model activates.

The burnout trajectory uses linear regression over the most recent $n$ observations to project forward:

$$\hat{y}(t) = \beta_0 + \beta_1 t$$

This lets the app surface a gentle warning like "Your sleep and energy levels have been declining for 5 days" before the user themselves recognizes the pattern.

Lag-correlation is also computed between metrics — for example, detecting that poor sleep at $t$ tends to predict high stress at $t+2$ for a given user, enabling more personalized insights.

Internationalization & Accessibility

The app supports 5 languages: English, Spanish, French, Hindi, and Arabic — including full RTL layout for Arabic. Crisis resources are localized per language. The mobile-first PWA layout uses a bottom nav on mobile and a sidebar on desktop, with full dark mode support.


🧱 Challenges I Faced

1. Keeping voice truly private The Web Audio API gives access to raw PCM audio data, and the temptation to store it is real — it contains far richer information than any numeric summary. But storing audio would be a fundamental privacy violation for a mental health app. Designing a pipeline that derives biomarkers numerically and immediately discards the stream required careful sequencing of async operations and a lot of testing to ensure nothing slipped through.

2. Structured JSON from an LLM, reliably Getting the Llama model to return clean JSON every single time — across emotional, sometimes fragmented user inputs — was harder than expected. The solution was a combination of a very explicit system prompt, a strict output schema described in the prompt, and a client-side fallback parser that handles edge cases gracefully rather than crashing.

3. FHIR as a real constraint, not just a buzzword It would have been easy to slap "FHIR-compatible" on a custom schema. Actually conforming to FHIR R4 — using correct LOINC codes, proper resource references, valid effectiveDateTime formats, and conformant Observation categories — took significant research. The payoff is that a user can export their bundle and a healthcare provider could theoretically ingest it.

4. Meaningful drift detection without false alarms Early versions of the drift engine were too sensitive — a single bad night would trigger a burnout alert. Calibrating the baseline builder (requiring at least 3 days before activating drift scoring) and tuning the weights so that isolated events don't dominate the score required building out the full demo seed data and simulating dozens of realistic usage patterns.


📚 What I Learned

  • Voice as an interface is underexplored in health tech — even simple audio features add a layer of emotional context that text simply cannot.
  • Privacy-by-design is an architectural decision, not a feature — it has to be built in from the start, not bolted on.
  • FHIR is genuinely powerful — designing around an open health data standard from day one is worth the overhead.
  • LLMs need very explicit structure to behave predictably — especially in emotionally sensitive contexts where an off-brand response can break trust instantly.
  • Accessible, multilingual design is not a stretch goal — for a mental health app that aims to help everyone, it's a core requirement.

🚀 What's Next

  • Replace localStorage with an encrypted IndexedDB store for larger history windows
  • Add wearable integration (heart rate, HRV) via Web Bluetooth for richer physiological signals
  • Build a lightweight backend option for users who want cross-device sync with end-to-end encryption
  • Peer-reviewed validation of the cognitive distortion detection against clinical instruments (PHQ-9, GAD-7)
  • Guided breathing and grounding exercises triggered automatically when distress signals are detected

Built With

  • nextjs
Share this project:

Updates