Inspiration
Dating is nerve-wracking. You're sitting across from someone, they say something, and your mind goes blank. What if you had a friend in your ear — a wingman — who could read the room and whisper the perfect thing to say next? That's Barney. An AI wingman that lives on your Meta Ray-Bans, listens to your date in real time, and coaches you through it.
What it does
Barney runs on Meta Ray-Bans and listens to your date's voice in real time. It detects their emotional state — calm, frustrated, amused, confused, excited, relieved — and pushes contextual coaching nudges straight to your display. Things like "He seems annoyed, try to relate with him and make him feel heard" or "They're joking not complaining — laugh with them." No fumbling with your phone, no awkward pauses. Just subtle, real-time guidance while you stay present in the conversation.
How we built it
- Modulate's batch STT API (velma-2-stt-batch) for real-time voice emotion detection — extracts emotion labels like Calm, Amused, Relieved from speech
- Groq (Llama 3.3 70B) as the LLM to generate contextual dating nudges and infer emotions from transcript context when acoustic analysis returns Neutral
- FastAPI backend that captures audio, orchestrates Modulate + Groq calls, and streams results over WebSocket
- Vanilla JS frontend with a HUD-style overlay designed to mirror what the Ray-Ban display would show — dark glass nudge cards, emotion badges, and hands-free Voice Activity Detection
- Emotion fallback system — when Modulate's acoustic analysis says Neutral, Groq infers emotion from what was actually said, giving us accurate reads like Anxious, Confused, or Relieved
Challenges we ran into
- Modulate's streaming WebSocket API was completely broken — returned "Internal server error" after ~22 seconds regardless of what we tried. We pivoted to the batch REST API which worked perfectly.
- Modulate returned "Neutral" for 4 out of 6 test conversations. We solved this by having Groq also infer emotion from transcript context, then using Groq's label as a fallback — turns out combining acoustic analysis with LLM contextual inference gives much better results than either alone.
- Chrome silently blocks microphone access without a user gesture. Took debugging to figure out why our hands-free flow wasn't working — had to restructure the permission flow.
- Timing the nudges — too fast feels intrusive, too slow and the moment's passed. Tuned the delay to feel natural.
Accomplishments that we're proud of
- Fully hands-free flow with VAD — no tapping, no swiping, just conversation. Exactly how it needs to work on Ray-Bans.
- The HUD overlay genuinely feels like what an AR coaching interface should look like — dark glass cards, color-coded by nudge type, all white text for readability.
What we learned
- Voice emotion detection is still early — even purpose-built APIs return Neutral more often than not for conversational speech. Hybrid acoustic + LLM inference is the way forward.
- Hands-free UX is hard. Every interaction has to work without touch, and timing is everything.
- For hardware demos, having a scripted fallback mode is essential — you can't trust live APIs on stage.
What's next for Barney - AI Wingman
Native Meta Ray-Ban integration — move from the browser prototype to a real Ray-Ban app with direct mic access and display output
Two-way emotion tracking — detect emotions from both speakers and coach based on the dynamic between them
Adaptive personality — learn the user's conversation style over time and personalize nudges
Beyond dating — job interviews, sales calls, difficult conversations — anywhere reading emotional cues matter
Built With
- fastapi
Log in or sign up for Devpost to join the conversation.