Inspiration

Two million Americans live with aphasia. Tens of thousands more live with ALS, locked-in syndrome, late-stage MS, or severe motor disorders that have stripped them of speech.

They have something to say. Their bodies just can't say it anymore.

We kept coming back to a thought: people often lose their voice before they lose what they wanted to say. A father with ALS still wants to tell his daughter he loves her. A stroke survivor with aphasia still has the words in their head, they just can't get them out. The technology to give them their voice back exists today. It just hasn't been put together for them.

So we built Cheshire.

The Cheshire Cat appears and disappears, leaving only its smile. The voices we're building Cheshire for are the same way; fading, but still there.

What it does

A patient performs a tiny gesture they can still make; a wrist tilt, a finger tap, a head nod. Cheshire captures the motion through phone sensors, classifies it with a personalized LSTM model, and speaks the matching phrase aloud in the patient's own pre-illness voice, cloned with ElevenLabs.

The user does not need to type. They do not need to talk. They only need one small repeatable movement.

Gesture Phrase
Small wrist circle "I'm hungry."
Two finger taps "I love you, Mom."
Wrist tilt right "I need help."
Wrist tilt left "Yes."
Small forward motion "Thank you."

Five gestures. Five training samples each. Their voice, back.

How we built it

The pipeline runs in five stages:

1. Sensor capture. Phyphox streams accelerometer and gyroscope data from a phone strapped to the patient's wrist or finger. The data is sampled at 50Hz and windowed into 1-second time-series segments.

2. Gesture classification. A personalized stacked LSTM, adapted from Guillaume Chevalier's HAR architecture, classifies each window. Each user gets their own model — five training samples per gesture is enough because the classifier only needs to discriminate between this user's specific motions.

3. Authorization. Every speech act is authorized through Auth0 AI Agents with a scoped token. Voice cloning is one of the highest-risk AI capabilities deployed today — a cloned voice saying something the real person never authorized would be devastating. So consent is built into the architecture, not bolted on. The AI can only speak phrases the patient has explicitly trained.

4. Voice synthesis. ElevenLabs generates the phrase in the patient's own cloned voice, reconstructed from a 30-second pre-illness sample. When Daniel says "I love you, Mom," it sounds like Daniel.

5. Logging and analytics. Every speech act is written to Snowflake. Caregivers see longitudinal communication patterns — phrases per week, recurring asks, gesture accuracy over time. For ALS patients especially, motor accuracy degrading at a measurable rate is a clinical signal. Cheshire becomes passive monitoring without adding burden.

An Arduino UNO R4 WiFi sits next to the patient's hand with an ultrasonic sensor and a tiered LED system, giving nonverbal confirmation that the system understood the gesture before the cloned voice plays.

Challenges we ran into

Designing for trust, not just functionality. Voice cloning is dangerous. We spent more time on the consent flow, the Auth0 scoping story, and the confidence-thresholding behavior than on the model itself. If Cheshire ever speaks the wrong phrase in the patient's own voice, that's a high-trust failure — much worse than a normal app bug. Every screen had to reflect that.

Personalized models from tiny datasets. A real ALS patient may not be able to give us a hundred clean training samples. Five is realistic. Getting the LSTM to converge on five samples per class without overfitting required smaller hidden dimensions, dropout between layers, and stratified validation. The Chevalier HAR architecture had to be downsized.

The original LSTM repo was TensorFlow 1.0. All tf.contrib, all tf.placeholder, none of it runs on modern TF. We rewrote it in TF 2.x / Keras with the same architectural choices.

Designing a UI that didn't feel like a hackathon mockup. Our first screens looked like every healthcare app: stock blue, clinical, sterile. Cheshire is about voice, identity, and intimacy — not infirmity. We rebuilt the design system around deep purple and space blue, Instrument Serif italic for emotional moments, with glassmorphism and soft glows that feel less "clinical" and more "consciousness fading and returning."

Honest scope. A medical-grade deployment would need IRB approval, HIPAA audits, FDA review, and clinical validation. We made sure the README is explicit about what's working end-to-end versus what's prototype.

What we learned

Voice cloning is no longer the bottleneck. ElevenLabs at 30-second sample quality is good enough that a daughter wouldn't second-guess hearing her father's voice say "I love you."

The bottleneck is authorization and trust. Auth0 AI Agents existed exactly for this kind of moment — AI acting on behalf of a real human, in a way that needs to be scoped and accountable. Designing speech acts as authorized events instead of arbitrary outputs changed how we thought about every screen.

Snowflake also became more interesting than we expected. We started thinking of it as just "where the logs go." It's actually the foundation for the clinical-signal layer — measuring gesture accuracy decline over time turns the same dataset into a passive monitoring tool that doesn't add burden to the patient.

LSTMs still work. Transformers get all the attention right now (no pun intended), but for short time-series classification on small per-user datasets, a tight stacked LSTM converges in seconds and runs anywhere.

What's next

If we keep building this past the hackathon, in priority order:

  1. Tremor-robust feature pipeline — wavelet-based denoising so late-stage motor symptoms don't break the model
  2. Adaptive online learning — update the personalized model continuously as the patient's motion drifts
  3. Real Auth0 AI Agents production integration — replace the visual mock with live token issuance and validation
  4. Caregiver mobile app — companion app so the dashboard isn't desktop-only
  5. Clinical pilot — partner with an ALS clinic for a 10-patient feasibility study

Some people lose their voice before they lose what they wanted to say. Cheshire is for them.

Built With

Share this project:

Updates