Heard Autism

Inspiration

There is a child in our neighborhood who has autism. He communicates, reacts, and feels, but the adults around him often have to guess. Happy, scared, overwhelmed, or something else entirely. Sometimes the guess is wrong, and the support that was already there never lands.

That gap is why we built Heard. Not to fix the child and not to replace real conversation, but to give caregivers a clearer signal when words and faces are hard to read.

Generic emotion AI makes this harder. Models trained on typical speech can sound confident while being wrong on atypical expression. We wanted the opposite. Honest scores, abstention when unsure, and a path toward learning each child as an individual.

What it does

Heard is a voice first emotion reader aimed at the USAII Make Support Obvious challenge.

Muhammad's stack (audio wearable path)

Microphone captures pitch, energy, rhythm, and how the voice changes over time, not the words themselves
A CNN + GRU model trained with face guided learning on autism relevant data (MobileNetV2 teacher on FER Autism, audio student at inference)
~81% test accuracy on five emotion classes, speaker independent split
Exported to ONNX for edge use, with a hardware sketch (Arduino + 8×8 LED matrix) to show emotion patterns

Cora's stack (research and fusion path)

wav2vec2 speech emotion baseline, 64% speaker independent on RAVDESS (chance 12.5%)
Per child calibration with a few samples per emotion, measured ~67% → ~77%, with the hardest speakers gaining the most (~+19 points)
Multimodal demo on video clips with sound. Face branch (~66.4% on FER Autism six classes) fused with voice when both are present
Abstention when confidence is low instead of guessing
Live Colab so judges can try mic or upload without installing anything

Together this is one project with two runnable demos that meet in the same idea. Understand the child first, then support can actually arrive.

Try it

Team repo (Muhammad) https://github.com/MuhammadBinary/HeardAutism---USAII-Muhd-Cora-
Fusion + personalization repo (Cora) https://github.com/xqscora/usaii-autism-emotion
Colab https://colab.research.google.com/github/xqscora/usaii-autism-emotion/blob/master/demo_colab.ipynb

How we built it

Data (public only)

RAVDESS for labeled speech emotion
FER Autism and FER2013 for face emotion
ASDSpeech features for autism speech distribution (no emotion labels, useful for future domain work)

Muhammad

Built MFCC, zero crossing rate, and RMS feature pipeline
Trained CNN + GRU audio model with auxiliary face supervision during training
Evaluated speaker independently and exported inference_model.onnx
Documented wearable flow mic → laptop/edge → ONNX → Arduino LED

Cora

Trained wav2vec2 embedding + logistic regression SER baseline with held out speakers
Wrote personalization and sweep scripts to show gains from K samples per emotion
Built demo_emotion.py, demo_face_emotion.py, and demo_multimodal.py for audio only, face only, and fused video
Added Colab notebook with mic, upload, and optional webcam paths

Demo video

Combined slide walkthrough, Muhammad's audio story, Cora's personalization results, and a live multimodal clip (recording/Heard_USAII_demo_v1.mp4 in Cora's repo)

Challenges we ran into

Almost no public datasets label autistic children's emotional speech. We had to be upfront that emotion labels mostly come from neurotypical adults while autism specific voice stays under labeled.

Cross modal training helped audio learn richer emotion cues, but deployment stays audio only by design so the wearable stays simple and privacy friendly.

Personalization needs a few labeled samples per child, which is realistic for a caregiver assisted setup but not magic. We focused on showing that the speakers generic models fail on are exactly where personalization helps.

Hardware was limited on one teammate's side (laptop and phone issues), so we leaned on Colab, ONNX, and a screen recorded demo with voiceover instead of a live on stage device.

Accomplishments that we're proud of

End to end audio emotion pipeline with strong held out test numbers and ONNX export
Face guided training that stays audio only at inference
Personalization curve that improves with small K and helps the worst cases most
Multimodal fusion demo on real video with both face and voice
Abstaining instead of confident wrong labels
Runnable Colab for judges and teammates who cannot install locally

What we learned

Support fails when understanding fails first. A slightly uncertain but honest read beats a wrong label delivered with full confidence.

Better AI for this community is not always louder AI. Sometimes the most useful output is "I am not sure, please check in with them."

Every child expresses differently. A generic model is a starting point. A model that learns this child is the direction we care about (we frame that long term work as Cerome).

What's next for Heard

More autism labeled speech if dataset requests come through
Deeper domain adaptation on ASDSpeech
Smaller edge board running the ONNX model with the LED wearable prototype
Per child memory over time instead of one shot calibration
Caregiver in the loop UI so every read can be confirmed or corrected

Built With

arduino
jupyter-colab
librosa
onnx
opencv
python
pytorch
scikit-learn
tensorflow
transformers-(wav2vec2)

Submitted to

USAII® Global AI Hackathon 2026

Created by

I built the research and fusion path for Heard: wav2vec2 speech-emotion baseline with speaker-independent evaluation, per-child calibration scripts, multimodal demo (face + voice on video clips), abstention when confidence is low, and the live Colab notebook so judges can try mic/upload without installing anything. I also wrote the fusion repo docs and contributed to the team demo video.

cora zeng
Muhammad Mujahid Haruna

Updates

Muhammad Mujahid Haruna started this project — Jun 21, 2026 10:22 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.