Barney - AI wingman

Inspiration

Dating is nerve-wracking. You're sitting across from someone, they say something, and your mind goes blank. What if you had a friend in your ear — a wingman — who could read the room and whisper the perfect thing to say next? That's Barney. An AI wingman that lives on your Meta Ray-Bans, listens to your date in real time, and coaches you through it.

What it does

Barney runs on Meta Ray-Bans and listens to your date's voice in real time. It detects their emotional state — calm, frustrated, amused, confused, excited, relieved — and pushes contextual coaching nudges straight to your display. Things like "He seems annoyed, try to relate with him and make him feel heard" or "They're joking not complaining — laugh with them." No fumbling with your phone, no awkward pauses. Just subtle, real-time guidance while you stay present in the conversation.

How we built it

Modulate's batch STT API (velma-2-stt-batch) for real-time voice emotion detection — extracts emotion labels like Calm, Amused, Relieved from speech
Groq (Llama 3.3 70B) as the LLM to generate contextual dating nudges and infer emotions from transcript context when acoustic analysis returns Neutral
FastAPI backend that captures audio, orchestrates Modulate + Groq calls, and streams results over WebSocket
Vanilla JS frontend with a HUD-style overlay designed to mirror what the Ray-Ban display would show — dark glass nudge cards, emotion badges, and hands-free Voice Activity Detection
Emotion fallback system — when Modulate's acoustic analysis says Neutral, Groq infers emotion from what was actually said, giving us accurate reads like Anxious, Confused, or Relieved

Challenges we ran into

Modulate's streaming WebSocket API was completely broken — returned "Internal server error" after ~22 seconds regardless of what we tried. We pivoted to the batch REST API which worked perfectly.
Modulate returned "Neutral" for 4 out of 6 test conversations. We solved this by having Groq also infer emotion from transcript context, then using Groq's label as a fallback — turns out combining acoustic analysis with LLM contextual inference gives much better results than either alone.
Chrome silently blocks microphone access without a user gesture. Took debugging to figure out why our hands-free flow wasn't working — had to restructure the permission flow.
Timing the nudges — too fast feels intrusive, too slow and the moment's passed. Tuned the delay to feel natural.

Accomplishments that we're proud of

Fully hands-free flow with VAD — no tapping, no swiping, just conversation. Exactly how it needs to work on Ray-Bans.
The HUD overlay genuinely feels like what an AR coaching interface should look like — dark glass cards, color-coded by nudge type, all white text for readability.

What we learned

Voice emotion detection is still early — even purpose-built APIs return Neutral more often than not for conversational speech. Hybrid acoustic + LLM inference is the way forward.
Hands-free UX is hard. Every interaction has to work without touch, and timing is everything.
For hardware demos, having a scripted fallback mode is essential — you can't trust live APIs on stage.

What's next for Barney - AI Wingman

Native Meta Ray-Ban integration — move from the browser prototype to a real Ray-Ban app with direct mic access and display output
Two-way emotion tracking — detect emotions from both speakers and coach based on the dynamic between them
Adaptive personality — learn the user's conversation style over time and personalize nudges
Beyond dating — job interviews, sales calls, difficult conversations — anywhere reading emotional cues matter

Built With

fastapi

Updates

Hrishikesh Athreya started this project — Feb 27, 2026 07:37 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.