SoftEyes

Face emotion workflow
Speech extraction logic

Inspiration

Over 70 million people worldwide are autistic, and hundreds of millions more live with social anxiety. For many of them, face-to-face conversation is genuinely overwhelming. The simultaneous cognitive demands of tracking what to say next, reading the other person's emotional state, managing eye contact, and keeping up with the flow of a conversation can make even a simple exchange feel impossible.

We noticed that almost every tool designed to help with social communication is a training tool (something you use at home, in a clinic, or in a controlled setting). Nothing exists that helps you in the moment, when you're standing in front of someone and the conversation is already happening.

We wanted to build something that meets people where they actually struggle and aid them in real life, in real time. And when we got access to Snap Spectacles, we realized the hardware finally existed to do it.

What it does

SoftEyes is a real-time social companion that runs on Snap Spectacles AR glasses. It operates quietly in the wearer's field of vision during face-to-face conversations, providing three layers of gentle, unobtrusive support, all designed around published research on what genuinely helps autistic individuals in social settings.

Emotion detection and mirroring cues. SoftEyes reads the emotional state of the person in front of the wearer in real time using computer vision and DeepFace, and displays a soft reminder of how to respond (e.g. "Follow up on their university studies they mentioned" or "Ask what breed their dog is"). Many people with autism struggle to intuitively read and mirror emotion; this feature provides the signal they need, exactly when they need it, but doesn't script the entire conversation for them. Not only does it serve as a live assistant, but it gives the user more than enough autonomy and support to develop social skills and avoid relying on the glasses long term.

Eye contact guidance. A soft, calm glow is placed around the other person's eyes in AR. This gives the wearer a natural, comfortable place to look without forcing rigid eye contact which directly addresses one of the most commonly reported sources of anxiety in face-to-face interaction.

Every design decision was grounded in research. The UI uses pastel colours only. All elements fade gently in and out with no sudden appearances, no flashing and no jarring transitions. Information density is kept to an absolute minimum. Overlays are positioned spatially close to the person being spoken to, keeping the wearer's attention anchored in the conversation rather than pulled to a corner of their vision. The microphone pipeline applies noise cancellation and ambient audio suppression to help the wearer focus on the voice in front of them, not the room around them.

How we built it

CareXR is built in Lens Studio 5.15.4 for Snap Spectacles, with a Python backend running locally and exposed via ngrok.

Lens side (JavaScript / GLSL): The core script, BackendEmotionDetector.js, uses Lens Studio's CameraModule to access the Spectacles outward-facing camera. It listens for FaceFoundEvent and FaceLostEvent to track the presence of the other person. On each camera frame, throttled to 0.1-second intervals, it encodes the frame as a Base64 JPEG using Base64.encodeTextureAsync() with LowQuality compression, then POSTs it to the backend via Lens Studio's InternetModule. The emotion label is rendered as a Text3D component anchored above the tracked face via a Head Binding in the scene hierarchy.

Backend (Python / Flask / DeepFace): emotion_server.py is a Flask server running on waitress (production WSGI, 4 threads) on port 5001. On receiving a request, it decodes the Base64 image, downscales it and runs DeepFace.analyze() using the MTCNN face detector backend. DeepFace's 7 raw emotion outputs are mapped to 4 simplified categories: Joy, Sadness, Anger, and Neutral. Results are returned as JSON with emotion, confidence and latency in real time.

Tunneling: ngrok exposes the local Flask server to the Spectacles device over HTTPS.

UI / GLSL: Custom GLSL shaders handle the soft glow effect around the eyes and the pastel overlay elements, with shader-driven fade transitions to keep all visual changes smooth and non-jarring.

Challenges we ran into

Learning Lens Studio from scratch in a weekend. Neither of us had used Lens Studio before this hackathon. The scripting model, coordinate system, component architecture, face tracking API, camera access patterns, and deployment pipeline were all entirely new. We were learning the environment and building the product at the same time.

Blend shapes don't work on Spectacles. Our original emotion detection approach used EmotionDetector.js and reading face mesh blend shape weights like MouthSmileLeft and BrowsDownLeft directly from Snap's face tracking. It worked well in Lens Studio preview. On actual Spectacles hardware, every blend shape returned near-zero values. The device's outward-facing camera, combined with real-world conversational distances and angles, produced data that the blend shape system couldn't interpret. This forced a full architectural pivot to the DeepFace backend approach mid-hackathon.

Designing against our own instincts. Every instinct in interface design pushes toward more information, more feedback, more visual confirmation. We had to fight that constantly. A person already overwhelmed in a conversation does not need a busy overlay. We went through multiple iterations stripping things back (slowing animations, replacing saturated colours with pastels, cutting elements that felt helpful in isolation but added cognitive load overall).

Accomplishments that we're proud of

We're proud that SoftEyes is genuinely research-grounded and not just intuitively designed. The choice of pastel colours, fade-in/fade-out transitions, zero flashing elements, minimal information density, spatial anchoring of overlays near the conversation partner, and light non-intrusive eye contact guidance all map directly to published research on sensory and cognitive accessibility for autistic individuals. We used evidence rather than guessing at what would be calming for our users.

We're proud of diagnosing and solving the blend shape failure on-device and pivoting to a working architecture under hackathon time pressure.

What we learned

We learned Lens Studio end-to-end: scene hierarchy, camera access via CameraModule, face tracking events, Head Binding, Text3D, InternetModule for HTTP, GLSL materials, and the full push-to-device deployment flow.

We learned that emotion detection in real-world conditions is a fundamentally different problem than emotion detection in controlled preview environments. Distance, camera angle, natural unposed expression, and hardware-level processing all compound in ways that no benchmark captures.

What's next for SoftEyes

Moving to on-device inference. The ngrok + DeepFace architecture works but has latency and requires a nearby laptop. We want to import a pretrained ONNX emotion classifier via Lens Studio's SnapML integration to run inference directly on the Spectacles hardware with zero network round trips. This simply would have taken too long to train during the hackathon though.

The consumer Spectacles hardware is coming in 2026. We want SoftEyes to be ready for it and we believe this is a category of assistive technology, not just a hackathon project. The same architecture could serve people in job interviews, medical appointments, or any high-stakes social situation where the gap between how someone presents and who they actually are is widest.

Built With

Submitted to

CareXR Hack
- Winner Snap Spectacles Track

Created by

Rayyan did NOT snap or chat..

Ahmad Al-Jabi
hello
Rayyan isn't telling the truth

Jawdat Al-Jabi
EE & AI @ McGill
I did some snapping, and perhaps some chatting...

Rayyan Khan
We dev, and we post, together we are devpost