Inspiration
The spoken word is the most intimate connection that we share with other humans; for most, speech was the first way in which we communicated ideas, creativity, and emotion. Yet, our world is home to 64 million people that suffer from childhood-induced or paralysis-induced mutism (inability to speak) — 400,000 alone suffer from amyotrophic lateral sclerosis, a neurodegenerative disease that inhibits motor function and speech. So we asked ourselves: For someone with no motor control and unable to speak, how can we not only restore their ability to communicate, but also add emotion and expression to their speech? This is why we created a system that deterministically decodes EEG brainwaves to text with perfect accuracy.
While sign language is a physical alternative to speaking, it is not a replacement for emotionally rich vocal speech; moreover, sign language is infeasible for those with motor control impairments. There are some potential solutions that try to replicate vocal speech using electrical stimulus or EEGs to try and predict what mute individuals would like to say (e.g. Stephen Hawking's wheelchair); yet, these are highly expensive and cannot convey emotional or artistic expression / imagination.
Artistic and emotional expression is a core element of being human. Not only do we derive text from the brainwaves, we also add an element of emotion (detected by alpha, beta, theta, gamma, and delta from EEG) which we incorporate through two mediums: 1) Generated Image: This visual representation brings the user’s imagination to life by using their text input and emotional state to guide an AI-generated image, and 2) Speech: By incorporating the user’s emotional state, we add natural inflections to a synthesized voice, creating a more authentic and expressive reading of the text input.
This is ultimately where NeuroScribe comes in. We aim to transform how those with speech and motor impairments communicate by developing
- A novel method that deterministically converts EEG signals to text: any sentence you want to express can be conveyed through minor facial contractions.
- A means for creativity and emotional expression through image and speech (based on the generated text from the brainwaves)
What it does
- Wear a Brain-Computer Interface (BCI) that collects EEG signals on your brain.
- Capture faint electrical stimuli emitted by the user’s brainwaves and convert it character by character into words.
- Express these words as images and emotionally-charged speech!
How we built it
- Assembled BCI (OpenBCI) to collect real-time EEG and 3-axis accelerometer (ACC) time-series data.
- We pair a core EEG microcontroller with 8 input channels with an additional module with 8 EEG channels (for a total support network of 16 electrodes for higher information density). We wire these 16 input channels to 16 electrodes connected to our head via conductive electrode gel.
- Associate EEG + ACC waveform behavior with discrete action classes. Used Brainflow API + OpenBCI SDK.
- Since we would be using BCI signals to predict characters/words from the user’s brain, our action space needed to be small and predictable. We had the following action classes. Each of the following classes can be classified by 16-channel EEG and accelerometer data with digital signal processing.
- Baseline Class (no action)
- Class 0: Left facial muscle contraction
- Class 1: Right facial muscle contraction
- Class 2: Head Nod (z-axis)
- Class 3: Head Shake (x-axis)
- After a period of time, we have a sequence of 0s, 1s, 2s, and 3s from these classes, which we can then use for text generation / manipulation.
- Map Action classes into language / word manipulation
- The 0s and 1s in the sequence are used to represent letters of the alphabet. We create a Huffman Encoding that maps binary strings to characters, with the most commonly used characters having the shortest binary strings for optimal information transfer.
- We develop an autocomplete algorithm based on statistical frequency of words with the given prefix. A “2” in the sequence represents accepting the autocomplete.
- A “3” class represents erasing a word in case the user made a mistake.
- We have an autocorrect algorithm in case a given sequence of numbers is invalid (we use edit distance to find the character / word closest to the invalid sequence).
- After converting a sequence of numbers to characters, we now have a sentence!
- Now, we use ElevenLabs to generate speech and Luma AI to generate images from this sentence! We add an additional emotion input stream, generated from alpha/beta/gamma/theta/delta bands (frequency bands derived from EEG using bandpass filters + FFT).
- This entire process is automated using FastAPI / Flask endpoints to send and get data between servers.
Challenges we ran into
Our electrodes picked up an incredible amount of noise due to the fact that we weren't collecting stimulus directly from the user's head; we placed electrodes behind the user’s ears instead (user convenience in exchange for signal power), so mapping to different action states was very difficult for us! We had to create some interesting DSP algorithms that still captured the peaks of the system and we spent a lot of time calibrating and figuring out the features with the highest importance.
We also had the task of prompting ElevenLabs with emotional data. Figuring out the variations in alpha/gamma/beta/theta/delta information, and then mapping that to an arousal states and different prompt definitions, was a very complicated flow due to the fact that real world emotional states are reflected through complex voice inflections and attunations. ElevenLabs had dense documentation for tweaking its audio appearance, so we spent many hours optimizing voice characterizations to associated emotions/excitement states.
Accomplishments that we're proud of
Not a single one of us had experience with BCI prior to this hackathon. In fact, we spent the first 14 hours of treehacks building the BCI system and trying to make sense of what exactly the electrodes were relaying. Yet, after 36 hours, we were able to generate images and voices by almost imperceptible facial muscle contractions. We’re super proud of the novelty of our idea — we spent a significant amount of time just thinking about how you could create words from brainwaves, and so we’re proud of the creativity involved in ideas like using a Huffman Tree to encode binary to every character or mapping continuous-time signals to discrete action classes through jaw clenching.
What we learned
We learned how you can map complex continuous-time brainwave signals into 3-4 discrete action classes that can effectively generate every word in natural language — this is incredibly powerful. Moreover,
Built With
Luma AI ElevenLabs OpenBCI EEG Headset + Electrode Sensors BrainFlow API + OpenBCI SDK Flask, FastAPI, Tailwind, WebSocket
Next Steps
Image a world in which emotionally intelligent AI understands your core feelings without explicit prompts. The positive impacts of this have not even been imagined yet—in a world where we are skydiving into a mental health crisis, such a system would be allow people to discover themselves.
Built With
- brainflow
- elevenlabs
- fastapi
- luma
- openbci
- tailwind
- websocket


Log in or sign up for Devpost to join the conversation.