pALS | Devpost

Inspiration

A life without the ability to communicate is utterly miserable. Unfortunately, this is a reality that thousands of people, with dreams and ideas just like you and me, suffer from every day. Picture yourself without a mouth to speak, without limbs to gesture, how will you maintain your humanity? We have made it our mission to give back expression to those who have lost it.

What it does

Our program processes gaze data obtained from a webcam, and maps it to our speech interface. Our user experience features a collection of ideas rather than a catalogue of words or letters; we believe that people do not communicate through just words, we communicate through ideas and context. Once the user has constructed a sentence, they can either confirm or restart the process. Once confirmed, the sentence is converted to speech and a caregiver is notified by email.

How we built it

pALS was built using triple agent orchestration on a local-first pipeline on an ordinary laptop webcam. The Eye-Gaze Perception Agent runs in a FastAPI backend where OpenCV pulls 720p frames at 30 fps and MediaPipe's FaceLandmarker emits 478 landmarks including the iris ring plus a 4×4 head-pose transformation matrix. For each frame we compute per-eye iris position relative to the lateral and medial canthi (bone-fixed reference, so head translation cancels out), head pose decomposed to yaw/pitch/roll/tx/ty/tz, iris diameter as a depth proxy, and Eye Aspect Ratio per eye for blink detection. Those 12 numbers expand to a 27-column ridge-regression design row including iris×head-pose interaction terms; per-user calibration is a 13-point grid that collects ~8 samples per dot on a settle→capture→advance timer, then we solve the ridge system directly with numpy, do a MAD-based outlier-rejection pass, and refit. A 1€ filter (Casiez et al.) smooths the output screen coordinates — low jitter during fixations, low lag during saccades. The Wafer Communication Agent turns idea paths into speech, using Wafer not as a chatbot but as a fast speech-expansion service: /api/suggest fans out parallel branches to predict the next likely tile mid-path, and /api/compose fans out parallel compose prompts and accepts the first valid structured response — returning both a first-person sentence and a second-person confirmation clause. The Caregiver Output Agent synthesizes audio with ElevenLabs and plays it through ffplay on the backend (because gaze and blink events aren't trusted browser gestures and would be silently blocked by autoplay rules), then composes a short caregiver email body and sends it via SMTP. The whole orchestration is exposed at GET /api/agents as a runtime manifest — three explicit agent boundaries with readiness flags, no heavyweight agent framework.

Challenges we ran into

We have faced difficulties with head movement overshadowing eye movement. Our first iris tracker used absolute pixel positions, which meant a 5 cm head shift moved the cursor in the direction of movement. We fixed it by taking the iris position relative to the corners of the eyelids. Another, bigger issue that we have faced was cross platform support.

Accomplishments that we're proud of

We have managed to create a working gaze tracking system that is independent of head movement using just stock laptop webcams.

What we learned

We have learned a lot about ALS patients, paralysis, and the realities of assistive communication. We have also learned that the pace of thought and communication is very important, and the speed of Wafer has proven to be quite useful.

What's next for pALS

We want our creation to be accessible for everyone in need. We plan to keep pALS completely open source and free to use. The next steps for us would be to improve the calibration software potentially integrating machine learning into the process, implement ElevenLabs voice cloning, and a smart-home bridge so the Control Home branch can be used to dim the lights, or call the elevator. We will continue to improve on what we have built during Uncommon Hacks 2026.

Built With

agent-development-kit
elevenlabs
fastapi
python
react
text-to-speech
wafer

Updates

Ten Munkhbat started this project — May 17, 2026 11:05 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.