Inspiration

Mental-health support often arrives in a generic assistant voice. Research on self-reference and hearing one’s own voice suggests that listening to yourself can engage awareness and emotional processing differently than hearing a stranger. We wanted to test a simple idea: your cloned voice becomes part of the intervention—not just narration from a bot—while a neutral agent still provides structure, questions, and analysis. InnerVoice is that experiment, grounded in references in /docs (e.g. InnerSelf, Shirvani, Costa et al., Kim & Song) and styled with Alan’s health palette so it feels at home in a care context.

What it does

InnerVoice runs short sessions (~8–15 minutes) through clear phases: optional onboarding voice capture, anchoring (mood + spoken anchor), exploration (CBT-style flow, limited user turns), analysis with an InnerVoice nudge (short first-person I-statement in the user’s cloned TTS voice, targeting the main cognitive distortion), then InnerVoice replay (validation / reframing / intention), feedback sliders, and closing synthesis. The agent uses two modes: EXCHANGE (neutral TTS—support, inquiry, analysis) and INNERVOICE (cloned voice—affirmations, reformulation, replay). Crisis wording triggers emergency resources (e.g. US 988, France 3114, EU 112); copy is validated for safety. The app is not a substitute for professional care.

How we built it

Mobile: Expo / React Native, Zustand, Mistral STT/TTS via the backend (app.config.js shares API config with the server). Backend: Express in server/, SQLite persistence, a session state machine (sessionEngine.ts), Mistral integration for transcription, TTS, and agents; prompts in prompts.ts, explorationAgent.ts, classifySession.ts, innervoiceNudge.ts, with a client mirror in constants/prompts.ts where needed. Ops for dev: npm run server:dev on port 8787, ffmpeg for server-side audio conversion when needed; Expo often infers http://:8787 for the API unless EXPO_PUBLIC_INNERVOICE_API_URL is set.

Challenges we ran into

End-to-end voice pipeline: recording, formats, optional ffmpeg conversion, and reliable TTS in the cloned voice across devices and networks. Split-stack dev UX: phone/simulator, Metro, and local API URL discovery when Wi‑Fi or IPs change (restart Metro when the network shifts). Session design under time pressure: keeping exploration bounded (e.g. three user turns after the welcome) while still feeling coherent and therapeutic. Safety and tone: validating InnerVoice text (first person, no questions, self-harm heuristics) and aligning classifier distortion ids (stable French snake_case in JSON) with English UI labels in constants/cognitiveDistortions.ts.

Accomplishments that we're proud of

A clear narrative arc in-product—from anchor to nudge to replay—instead of endless chat. Dual-voice behavior that matches the concept: neutral agent vs your voice for the moments that land emotionally. A working prototype loop: session API + Expo app + history backed by SQLite, with Alan-aligned branding. Ethics hooks baked in: crisis routing, validation, and honest positioning as a prototype—not clinical care.

What we learned

Voice + structure beats vague “open conversation” for a demo: phases and limits make the experience testable and explainable. Server-mediated STT/TTS keeps keys and heavy lifting off the client while adding moving parts (latency, errors, audio plumbing). Research-backed UX still has to be engineered: prompts, classification, and safety rules are where the product becomes trustworthy enough to show.

What's next for innervoice

Harden reliability (errors, retries, offline messaging) and latency for TTS/STT. Richer evaluation: user testing on whether cloned-voice replay actually changes felt safety or insight vs neutral TTS. Deeper personalization (voice profile quality, session length) and optional clinician-aligned content review. Clearer path to production: privacy, data retention, regional crisis numbers, and regulatory framing if this ever leaves the hackathon setting.

Built With

Share this project:

Updates