Be My Voices

Inspiration

Over 40 million Americans experience some form of communication disorder, including millions who suffer from stroke-induced aphasia or severe voice loss. Speech recovery goes far beyond making sounds; it is tied to a person's identity, autonomy, and safety. Without a voice, individuals face severe isolation and struggle to advocate for their basic medical needs. We wanted to build a solution that restores not just the ability to speak, but the emotional nuance of human connection.

What it does

Be My Voices is an AI-powered assistive communication wearable for patients with motor-speech and neurological conditions like ALS, Alzheimer's, Parkinson's, or stroke aphasia.

To personalize the experience, patients (or their caretakers) can upload legacy audio or video recordings of themselves speaking normally prior to their condition. The app uses these files to clone and preserve their original voice.

When a patient speaks unclear or broken audio into the device, the application transcribes the speech, reconstructs the intended meaning using Google Gemini, and synthesizes a corrected sentence using that saved custom voice clone. Additionally, an integrated EEG headset detects the user's emotional state, ensuring the synthesized audio accurately conveys the true feeling and tone behind their words.

How we built it

We built Be My Voices as a full-stack web application integrated with a Superone wearable device.

Frontend: Built with React and Vite to handle file uploads for voice cloning, voice selection, live dictation, and audio playback.
Backend: Powered by FastAPI with a PostgreSQL database to manage session history and securely store the personalized cloned voice profiles.
AI Pipeline: We utilized ElevenLabs Speech-to-Text for initial transcription, Google Gemini for semantic sentence recovery, and ElevenLabs Text-to-Speech for voice generation and cloning, allowing users to train a custom model from legacy media files.
BCI Integration: We implemented an experimental hardware path using the Muse 2 EEG. While normal low-latency playback uses ElevenLabs Flash 2.5, our EEG-assisted mode routes through Eleven v3 to shape the generated voice tone based on real-time affective states (stress, valence, and arousal).

Challenges we ran into

Achieving a seamless, real-time communication loop was our biggest hurdle. We had to heavily debug browser audio recording behaviors, autoplay restrictions, and backend connectivity. Balancing low-latency output with highly expressive voice generation was a constant tradeoff. Furthermore, the Muse 2 integration proved difficult, as capturing live EEG streaming required wrangling extra libraries, LSL device discovery, and dealing with hardware noise in a hackathon environment.

Accomplishments that we're proud of

We successfully built a fully functional, end-to-end BCI and AI pipeline. Rather than stopping at a basic text-to-speech demo, we successfully integrated real-time dictation, custom voice cloning, a hands-free workflow, and an experimental EEG emotion-tracking mode within a single, cohesive application.

What we learned

Building for assistive communication is not just a machine learning challenge; it is fundamentally a UX and reliability challenge. We also learned how to strategically route AI models based on the task: fast, lightweight models are crucial for maintaining a conversational loop, while heavier, expressive models are necessary when emotionally guided BCI playback is the priority.

What's next for Be My Voices

We plan to harden the system through real-world user testing, specifically focusing on validating the pipeline in noisy environments. We also want to stabilize the live Muse 2 integration, expand our tone-control parameters, and refine the hands-free workflow to better support individuals with varying degrees of physical and verbal impairment.