MindOS
Technical Overview
https://www.loom.com/share/5348b74c2fd14f47992c22fadabc2f57
Inspiration
Imagine having a perfectly clear mind but being unable to communicate it to the world. For millions of people living with ALS, recovering from a stroke, or dealing with severe paralysis, the standard ways we interact with technology like typing on a keyboard or speaking to a voice assistant are often impossible. We realized that current tools fail exactly where they are needed most and for users who physically cannot use their hands or produce clear speech. This gap in assistive technology is what drove us to build MindOS. We wanted to create a solution that works even when a user is silent to provide a lifeline for communication and control without requiring direct brain implants or invasive procedures.
What it does
MindOS is a silent speech AI agent for your computer powered by micro muscle signals. With sensors placed along the jaw and throat region, a user can silently think about certain actions without making a sound. The system decodes that intent to control a computer. It allows for reliable actions like silent web browsing where a user can scroll, go back, or search. Unlike standard voice control, MindOS works when speech is difficult or impossible and supports hands-free usage. It also incorporates a frontend with inference that enables users to append new training data. This means the software learns based on any new user rather than just guessing.
How we built it
We built MindOS as a modular pipeline that connects signals to digital actions.
Hardware & Signal Ingestion: On the hardware side, we used two Myoware muscle sensors attached to the jaw via electrodes. These sensors capture raw EMG data, which is passed to an Arduino Uno. We utilized PySerial to ingest this serial port data into our laptop for real-time processing.
Signal Processing: We treated the muscle signals as time-series data, applying noise handling to filter out motion artifacts. We then processed the data with a classification-based Random Forest model to categorize English phonemes into four distinct biometric signatures based on EMG muscle activation.
AI Agents: Since raw signals are ambiguous, we implemented a multi-agent workflow to bridge the gap between signal and action:
- Context Agent: Because we were mapping limited signal categories to the complexity of human language, we built an intelligent agent to suggest the correct character or word based on context.
- Action Agent: Once the intent is understood, a second agent decides which specific browser action to take.
- Execution Agent: We used Playwright to drive the actual browser automation based on the Action Agent's decisions.
Challenges we ran into
Our biggest hurdle was the hardware limitation. We ideally wanted to build a 26-class classifier to map signals directly to the alphabet. However, since we were limited to two muscle sensors and fewer than 20 electropads, we couldn't capture enough distinct data points for that level of granularity. We were stuck with just 4 distinct signal categories trying to map to 26 letters.
To bridge this gap, we first implemented an exhaustive-but-pruned greedy search that enumerates possible letter sequences from the low-entropy signal stream and filters them using an English lexicon and frequency priors, dramatically shrinking the candidate space to linguistically plausible outputs. On top of that, we deployed a lightweight GPT-mini context agent that evaluates the remaining candidates against preceding text, grammar, and semantic coherence to select the most likely intended sequence in real time. This allows the system to ignore invalid combinations and lock onto the correct word based on probability and context.
Finally, ensuring the agent didn't hallucinate actions was difficult. We solved this by leveraging conservative AI prompts rather than giving it total freedom. This balance between speed and reliability was tough to find, but limiting the command set ensured our demos were high-confidence rather than chaotic.
Accomplishments that we're proud of
We shipped an end-to-end assistive interface that connects biological sensors to real computer actions within a hackathon timeframe. We are particularly proud of our real-time EMG decoding pipeline and the robust API boundary we built for fast iteration.
We achieved 96% accuracy on our model using a dataset of six hours of raw sub-vocal recordings, which we augmented to simulate variance and noise. Additionally, we successfully integrated multi-layer LLMs (using GPT-4o) to drive the decision-making process, proving that we could control a browser with nothing but silent intent.
What we learned
We learned that the hardest part isn't just modeling. It is making the experience reliable when dealing with the messiness of real-world sensors. Electrode placement, skin contact, and small movements can shift signals significantly.
We also discovered that constraining the interaction space dramatically improves trust; tool-based execution is essential for predictable agents. On a personal level, we learned the value of perseverance. Even when the hardware signals were noisy or the model failed to generalize, we pushed through to refine our pipeline until it worked.
What's next for MindOS
Next, we want to improve the calibration features so the system adapts quickly to new users. We plan to expand the vocabulary of commands while keeping reliability high, using our feedback logs to continually improve decoding accuracy.
For the future, we hope to implement multi-sensor fusion across the face to capture accurate and faster data. We also aim to move to on-device processing. By moving to an on-edge device, we can reduce latency and ensure that MindOS becomes a private, secure, and viable daily-use product for many people in need.
Built With
- api
- arduino
- c++
- express.js
- fastapi
- openai
- playwright
- python
- scikit-learn
- tailwind
- typescript
Log in or sign up for Devpost to join the conversation.