Inspiration
We've all been there. You see someone across the room, your heart starts racing, and your brain goes completely blank. What if Cupid didn't need a bow anymore? What if he could just sit on your shoulder and tell you exactly what to say?
ShoulderCupid was born from a simple idea: real-time AI coaching delivered through hardware you actually wear. Not an app you awkwardly check mid-conversation, it's glasses that see what you see, sensors that feel what you feel, and a coach that whispers what you need to hear.
What it does
ShoulderCupid is a pair of ESP32-CAM smart glasses paired with Bluetooth earbuds that give you live AI dating coaching during real approaches and conversations.
- The glasses stream video to our backend, where a C++ Presage SDK extracts the target's heart rate, breathing rate, and emotional state from the camera feed alone
- An ultrasonic sensor on the frame tracks your distance to the person — triggering different coaching modes (Idle → Approach → Conversation)
- A heart rate sensor monitors your own nerves in real-time
- ElevenLabs Scribe transcribes the conversation live, Gemini 2.0 Flash generates context-aware coaching tips based on the transcript + emotions + distance, and ElevenLabs TTS whispers the advice back through your earbuds
Before a session, you pick your AI coach from a Tinder-style discovery feed, each with a unique personality, voice, and coaching style. Over a million possible combinations.
How we built it
Hardware
ESP32-CAM AI Thinker module for video streaming on your glasses. Then a HC-SR04 ultrasonic sensor for distance tracking, MAX30102 heart rate sensor, and two servos, all mounted on your shoulder. The system runs on dual power — AAA batteries for the ESP32 and servos, and a 9V battery for the Arduino Uno that handles sensor I/O. Everything communicates over serial between the ESP32 and Arduino, with the ESP32 streaming JPEG frames to our backend over WiFi.
We designed a custom 3D-printed Cupid figurine enclosure to house the shoulder-mounted electronics, a stoic greek-marbled cherub that hides the camera and servo mechanism inside while keeping the sensors exposed. We used Tencent’s Hunyuan3D-2.1 to generate the model from a reference image. We cleaned up the mesh in Blender, refined the mechanical fitment in Fusion 360, and assembled the full housing with electronic component cutouts in Onshape before 3D printing. Generative AI cut our CAD workflow from hours to minutes.
Backend (Vultr)
Our entire backend runs on a Vultr Ubuntu VPS. This was a deliberate choice, the Presage C++ SDK for contactless vitals detection requires native Linux. Vultr hosts our Express.js API, Socket.io WebSocket server, and a Python vitals processing pipeline that spawns a Presage process per active session, feeding JPEG frames via stdin and reading JSON vitals from stdout.
We set up DuckDNS for a free dynamic DNS domain pointing to our Vultr IP, with HTTPS via Let's Encrypt so the frontend can securely connect over WSS (WebSocket Secure). Automated CI/CD via GitHub Actions deploys on every push to main —> rsync to Vultr + PM2 for zero-downtime restarts.
Database (MongoDB Atlas)
All persistent data lives in MongoDB Atlas, user profiles with OAuth identities, AI coach personalities with unique system prompts and voice mappings, live session vector embeded transcripts (every line timestamped with speaker + emotion tags), vitals timelines, and session analytics. Real-time writes during coaching sessions mean every interaction is queryable after the fact. Four core Mongoose models: User, Session, Coach, and Payment.
Authentication
Google OAuth 2.0 via Passport.js. Users sign in with their Google account where we store their profile, preferences, coach roster, and session history.
Payments (Solana Pay)
No one has to know what you're paying for... 3 free sessions per month, then pay-per-use in USDC via Solana Pay. 3 free sessions per month, then pay-per-use in USDC via Solana Pay. When you start a paid session, the app generates a Solana Pay transaction request and opens Phantom wallet for approval -> one tap, confirmed on-chain, session starts immediately. No credit cards, no payment processor, instant settlement. Three pricing tiers: Budget, Standard, and Premium, based on coach quality and session length (5, 15, or 30 minutes). Every transaction is verified on-chain by the backend before granting access, and payment records are persisted to MongoDB with the session they unlock.
Frontend (Vercel)
React + Vite + TypeScript with Tailwind CSS, auto-deployed on Vercel. API rewrites in vercel.json proxy all /api requests to our Vultr backend, so the frontend never exposes the backend IP. Socket.io client for real-time coaching UI — live transcript, vitals panel, session stats.
The Coaching Brain (Gemini API)
Every time the transcript updates, the backend sends the latest lines along with the target’s emotional state, session mode (Idle/Approach/Conversation), and distance to Gemini via a persistent chat session. Each coach has a unique system prompt that defines their personality, tone, and sample phrases. A “Smooth Operator” coach gives different advice than a “Gentle Guide” even for the same situation. Gemini streams its response back, and we start TTS before the full response is done to cut latency. We also use Gemini 2.5 Flash during preflight to validate API connectivity before sessions start, and postflight to analyze the full session: generating a summary, conversation highlights, and actionable recommendations for next time.
TTS & STT (ElevenLabs)
ElevenLabs powers both sides of the audio pipeline. Scribe v2 runs client-side via the React SDK for real-time speech-to-text. It transcribes both the user and the target with partial transcript streaming, so the coaching loop can react before a sentence is even finished. Echo cancellation and noise suppression are built in. On the output side, Flash v2.5 converts coaching tips into natural speech in the coach’s assigned voice which is picked from a curated pool of 30 voices matched by personality traits (smooth/confident, energetic/bold, calm/gentle). The MP3 audio streams back over Socket.io and plays through the user’s Bluetooth earbuds or speakers.
Vitals Detection (Presage SDK)
The ESP32-CAM streams JPEG frames over WiFi to our Vultr backend. Each active session spawns a dedicated Presage C++ process. Frames are fed via stdin, and the SDK outputs JSON vitals (heart rate, breathing rate, HRV, blink rate) from stdout. We map these vitals to emotional states: elevated HR + fast breathing = nervous, steady HR + slow breathing = calm, rising HR + normal breathing = excited. These emotions update on the frontend in real-time and get passed as context to the coaching AI, so advice adapts to how the target is actually feeling.
Person Detection (Edge Impulse)
Edge Impulse handles person detection from the camera feed, triggering the transition from Idle to Approach mode when someone enters the frame. The model classifies whether a person is present and their approximate distance, which combined with the ultrasonic sensor reading drives the session's state machine between Idle, Approach, and Conversation modes.
Challenges we ran into
- Presage SDK only runs on Linux — we couldn't develop locally on macOS. This forced us onto Vultr early, which ended up being a blessing for CI/CD but made debugging painful with remote-only testing
- Real-time audio loop latency — getting STT → LLM → TTS under a second required careful model selection (Gemini Flash over Pro, ElevenLabs Flash v2.5 over v2) and streaming responses before completion
- ESP32-CAM frame rate vs. vitals accuracy — the Presage SDK needs decent FPS for heart rate detection, but the ESP32 struggles to stream high-res over WiFi. We had to tune resolution and JPEG compression to find the sweet spot
- Coaching mode transitions — knowing when someone goes from "approaching" to "in conversation" based on distance + audio activity required state machine logic that doesn't flicker between modes
- HTTPS everywhere — Socket.io and ElevenLabs STT both require secure contexts in the browser, so we had to set up DuckDNS + Let's Encrypt on Vultr before anything worked end-to-end
- Dual power management — the ESP32-CAM draws too much current for the Arduino's 3.3V regulator, so we had to run separate battery supplies and tie the grounds together to avoid brownouts during WiFi transmission
What we learned
- Hardware constraints drive architecture decisions more than any framework choice — Vultr was picked because of a C++ SDK, not preference
- Real-time AI pipelines are all about model selection — the fastest model that's good enough wins
- MongoDB's flexible schema was a lifesaver when our data models changed daily during the hackathon
- Solana Pay is surprisingly straightforward to integrate — the hardest part was testing on devnet with fake USDC
- DuckDNS + Let's Encrypt is an underrated combo for hackathon projects that need HTTPS without buying a domain
What's next for ShoulderCupid
- Edge Impulse person detection on the ESP32 itself — triggering approach mode without server round-trips
- Post-session reports with AI-generated summaries, highlights, and improvement tips
- Coach learning — coaches that adapt to your style over multiple sessions using MongoDB session history
- Custom PCB — replace the breadboard wiring with an integrated board that fits inside the Cupid housing
- Miniaturization — smaller glasses frame, integrated PCB, longer battery life
Built With
- arduino
- autodesk-fusion-360
- blender
- c++
- dicebear
- duckdns
- elevenlabs
- esp32-cam
- express.js
- framer-motion
- github-actions
- google-gemini
- mongodb-atlas
- node.js
- oauth
- onshape
- passport.js
- phantom
- presage-sdk
- python
- react
- socket.io
- solana-pay
- tailwind-css
- typescript
- vercel
- vite
- vultr
Log in or sign up for Devpost to join the conversation.