Wingman

Your real-time conversation coach, powered by Meta Ray-Ban smart glasses and AI.

Inspiration

We've all been there — stuck in a networking conversation, blanking on what to say next, or realizing too late that someone dropped a fascinating detail we never followed up on. Social interactions are high-stakes and ephemeral. What if you had a friend whispering in your ear at just the right moment — "ask about her time at Stripe" or "you've been talking for a while, let them jump in"? We wanted to build that friend. The rise of smart glasses with always-on audio and the power of modern LLMs made it feel like the perfect time to turn that idea into reality.

What it does

Wingman is an iOS app that pairs with Meta Ray-Ban smart glasses to act as a subtle, real-time conversation coach. Once you start a session:

Live transcription — The app uses on-device Apple Speech Recognition to transcribe your conversation in real time via the phone's microphone.
AI-powered coaching — Every 30 seconds to 2 minutes (configurable), the transcript is sent to Google Gemini 2.0 Flash, which decides whether a coaching nudge is warranted. If so, it returns a short, casual "whisper" — max 8 words — like a friend tipping you off.
Discreet delivery — Suggestions appear as a toast notification on-screen and can optionally be spoken aloud via text-to-speech through your earpiece, accompanied by a subtle haptic tap so you never miss it.
Session history — Every session's full transcript and coaching suggestions are saved, so you can review what was said and what the AI picked up on.

The AI is opinionated about when to stay silent. If the conversation is flowing naturally, it skips. It only fires when there's something genuinely worth acting on — an ignored topic, a lopsided conversation, or a missed follow-up opportunity.

How we built it

SwiftUI for the entire front-end, targeting iOS 16+ on iPhone.
Meta Wearables DAT SDK (MWDATCore + MWDATCamera) to register with Meta AI, discover and connect to Ray-Ban smart glasses, and stream camera frames over Bluetooth.
Apple Speech framework (SFSpeechRecognizer + AVAudioEngine) for on-device, real-time speech-to-text transcription.
AVFoundation for audio session management and AVSpeechSynthesizer for text-to-speech delivery of coaching whispers.
Google Gemini 2.0 Flash via REST API for the AI coaching brain — structured JSON output with a carefully tuned system prompt that enforces brevity and relevance.
Keychain for secure storage of the user's Gemini API key.
Architecture follows an MVVM pattern: ConnectionViewModel manages the glasses pairing lifecycle, SessionViewModel orchestrates audio/transcription/coaching/TTS services, and CoachingService handles the Gemini integration with gating logic to avoid redundant calls.

Challenges we ran into

Bluetooth audio routing — Juggling the phone mic for transcription, the glasses' Bluetooth connection, and TTS output simultaneously required careful AVAudioSession category and mode configuration. Getting all three to coexist without one stealing the audio route from another was a recurring headache.
Speech recognition session limits — Apple's SFSpeechRecognizer silently kills long-running recognition tasks. We had to implement automatic restart logic with debouncing to keep transcription alive across an entire conversation session.
Keeping sessions alive in the background — iOS aggressively suspends apps. We use beginBackgroundTask and declare bluetooth-peripheral, external-accessory, and audio background modes, but there's a constant tension between iOS power management and a coach that needs to stay listening.
Prompt engineering for brevity — Getting Gemini to consistently return 8-word-or-fewer suggestions that sound natural ("ask about the Stripe job") instead of formal ("you should inquire about their employment history") took many iterations of the system prompt.
Knowing when to stay silent — The hardest coaching problem isn't generating a suggestion; it's knowing when not to. We gate Gemini calls behind a minimum-delta check (at least 3 new transcript entries since the last suggestion) and instruct the model to return skip: true when the conversation is flowing well.

Accomplishments that we're proud of

-Being able to integrate hardware into a product that people will be able to use every day

The coaching genuinely feels like a subtle whisper from a friend — not a robotic assistant. The 8-word constraint forces the AI to be specific and actionable.
End-to-end latency from "something interesting was said" to "coaching whisper delivered" is under 2 seconds in good network conditions.
The app gracefully handles the full lifecycle of glasses connectivity — registration, device discovery, camera permissions, and session management — with clear UI states and feedback at every step.
Zero backend infrastructure. The entire app runs client-side with a direct Gemini API call, making it trivially deployable and keeping conversation data on-device.

What we learned

Smart glasses development is still early. The Meta Wearables SDK is powerful but sparsely documented, so a lot of integration work was trial-and-error with the SDK's behavior around Bluetooth state changes and camera permissions.
On-device speech recognition is impressive but fragile at scale — it works great for short bursts, but long-running sessions need careful lifecycle management.
The best AI coaching isn't about maximizing suggestions — it's about maximizing silence. Users trust the system more when it only speaks up with something genuinely useful.
SwiftUI + async/await made the concurrency model surprisingly clean for an app that juggles audio streams, Bluetooth connections, network calls, and timers simultaneously.

What's next for Wingman

Vision analysis — The glasses camera stream is already connected but unused for coaching. Next step: feed frames to a multimodal model to read body language, name tags, or presentation slides and surface even richer context.
Post-session summaries — Use Gemini to generate a recap of key topics, names mentioned, and follow-up action items after each conversation.
Persona modes — Let users switch between coaching styles: "Networking", "Date Night", "Job Interview", "Conference", each with a tailored system prompt.
On-device model — Replace the Gemini API call with a local LLM to eliminate latency and network dependency entirely, enabling true offline coaching.
Apple Watch companion — Deliver whispers as haptic taps and complications on the wrist for an even more discreet experience.

Built With

access
api
apple
avaudiosession
bluetooth
device
gemini
google
hfp
meta
mvvm
rest
speech
swiftui
toolkit
tts
wearables

Updates

Anshul Mago started this project — Apr 12, 2026 06:36 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.