VoiceLegacy — Project Story

Inspiration

Most people never think about their voice until it's already changing.

For people living with ALS, Parkinson's, MS, or other progressive conditions, the loss of speech isn't sudden — it's gradual. And the hardest part isn't the silence that comes after. It's knowing it's coming and not knowing what to do before it arrives.

We were thinking about what it would mean to hear someone you love speak after they no longer can. Not a recording from years ago — their actual voice, saying the words they want to say today. That question became VoiceLegacy.

The insight that drove everything: voice banking technology already exists in clinical settings, but it's inaccessible, expensive, and designed for specialists — not for a person sitting at their kitchen table who just got a diagnosis. We wanted to build the version that anyone could use, in an afternoon, before it's too late.

What It Does

VoiceLegacy is a consent-first voice and message banking tool for people at risk of losing their speech.

Users record a short guided session — about 2 minutes of natural speech — and VoiceLegacy creates a personalized voice profile. From there, they build a Legacy Phrase Bank: a private, categorized collection of the words, names, jokes, and expressions that make their communication feel like them. Family. Daily needs. Comfort phrases. Emergency phrases. Inside jokes.

When they want to speak, they type what they want to say. Gemma 4 rewrites it in their personal communication style — warmer, shorter, or matched to something already in their phrase bank. ElevenLabs speaks it back in their preserved voice. MongoDB keeps their phrase bank private and persistent, growing more personal over time.

The product never replaces a voice. It helps someone hold onto theirs.

How We Built It

Stack:

Next.js 14 (App Router) — full-stack framework, API routes, and deployment
ElevenLabs — Instant Voice Clone for voice preservation, TTS for playback
Gemma 4 via Google AI Studio — tone rewriting, phrase suggestions, communication style personalization
MongoDB Atlas — phrase bank storage, voice profiles, user communication style
Clerk — authentication, triggered after the user hears their clone for the first time
Tailwind CSS — UI styling
Vercel — deployment

The most important product decision we made was the flow order. Users record their voice and hear their clone before we ask them to create an account. The emotional moment comes first. The ask comes after. That's not just UX — it's honesty about what the product is offering.

Consent is a visible, designed feature — not a checkbox. Before any recording begins, users see a clear statement of purpose, a privacy guarantee, a data deletion option, and an explicit "not for impersonation" notice. We treated that screen with the same design care as everything else.

Challenges We Faced

Audio quality is everything and nothing is guaranteed. ElevenLabs Instant Voice Clone is remarkably good — but it's only as good as the audio you give it. Background noise, inconsistent microphone distance, and uneven pacing all degrade the output. We had to carefully design the recording prompts and guide users through a consistent, natural session rather than just opening a microphone and hoping for the best. Getting the browser's MediaRecorder API to produce clean, consistent audio across different devices took more iteration than expected.

The "sound like me" feature needed a real answer. It's easy to put a button that says "sound like me." It's harder to make it actually do something defensible. We solved this during onboarding: users describe their natural communication style, and we store that as a profile. Every Gemma 4 rewrite prompt includes it as context. Simple, honest, and it actually works.

Consent for voice data is genuinely complex. Voice cloning is ethically sensitive. We spent real time designing the consent flow — not just legally, but humanly. What does it mean to preserve someone's voice? Who has the right to do this on behalf of someone else? We built in explicit confirmation, visible data controls, and clear language that this tool supports AAC (Augmentative and Alternative Communication) — it doesn't replace it.

Anonymous voice clones need a home. Users record before signing in, which means we briefly hold a voice profile with no owner. We solved this with a sessionStorage claim flow: the voice_id returned by ElevenLabs is stored client-side, and the moment the user creates an account, a claim route writes it permanently to MongoDB tied to their Clerk identity. One clean handoff, no orphaned data.

Prompting Gemma 4 for tone, not just text. Getting Gemma to rewrite a message in someone's personal style — not just rephrase it — required careful prompt engineering. Generic rewrite prompts produce generic output. We had to build a layered system prompt that includes the user's stored communication style, the intent of the rewrite, and guardrails to keep the message short and natural. The difference between a good rewrite and a bad one comes down to how well you define "sounding like someone."

What We Learned

We learned that the hardest part of building something emotionally meaningful is resisting the urge to over-engineer it.

Every time we added a feature, we asked: does this make the product feel more personal, or more like software? The phrase bank got simpler. The recording flow got shorter. The consent screen got more human.

We also learned that when ElevenLabs plays someone's voice back to them for the first time — even a teammate's voice, used only for the demo — the room gets quiet. That's the product working. Everything else is just infrastructure.

What's Next

VoiceLegacy is a prototype built in 24 hours. But the problem it addresses is real and long-term.

Future directions include family access sharing so loved ones can play phrases on a tablet, caregiver dashboards, integration with existing AAC devices, and expanded language support. The phrase bank model is designed to scale — the more someone uses it, the more personal it becomes.

The goal was never to build a voice cloning demo. It was to build something worth keeping.