Resonance - AI Voice Coach
Inspiration
We've all been there — sweaty palms before a big sales call, heart racing before confronting an angry customer, or mind going blank during a crucial negotiation. Traditional communication training relies on expensive coaches, awkward role-plays with colleagues, or simply "winging it" and hoping for the best.
We asked ourselves: What if you could practice any difficult conversation, anytime, with an AI that talks back naturally — and even interrupts you like a real person would?
The inspiration came from observing how pilots use flight simulators to handle emergencies they'll hopefully never face. Sales reps, customer service agents, and managers deserve the same — a safe space to fail, learn, and build confidence before the stakes are real.
What it does
Resonance is an AI-powered mobile app that simulates realistic voice conversations for high-stakes communication training.
Core Features:
- Real-time AI Voice Simulation — Talk naturally with AI personas powered by Gemini 2.5 Flash and ElevenLabs voice synthesis
- Natural Interruption (Barge-In) — Interrupt the AI mid-sentence just like real conversations, with <150ms response time
- Chaos Engine — Simulate real-world disruptions: background noise, connection drops, voice variations, and hardware failures
- Stress Mode — Handle multiple callers in queue with a stamina system that tests your endurance
- Voice Lab — Clone voices or choose from a library to practice with specific personality types
- Performance Analytics — Track pace (WPM), filler words, clarity, confidence scores, and emotional patterns
- Context Upload — Upload PDF/DOCX documents (product specs, scripts) for scenario-specific training
- Offline-First — All data stays local on your device with SQLite storage
How we built it
Tech Stack:
- Expo SDK 50 + React Native — Cross-platform mobile development with managed workflow
- Expo Router — File-based navigation for clean architecture
- NativeWind (Tailwind CSS) — Rapid UI styling with utility classes
- Zustand — Lightweight state management
- Gemini 2.5 Flash — AI conversation engine for dynamic, context-aware responses
- ElevenLabs WebSocket API — Ultra-low latency text-to-speech streaming
- Custom VAD (Voice Activity Detection) — Signal Energy RMS-based detection with ambient noise calibration
- expo-sqlite — Local database for sessions, transcripts, and analytics
- expo-secure-store — Encrypted storage for user API keys
- Moti + Reanimated — Smooth animations including the signature "Sun" orb visualizer
Architecture Highlights:
- Layered architecture separating presentation, business logic, services, and data access
- WebSocket streaming for sub-800ms voice response latency
- Haptic feedback on successful interruptions for tactile confirmation
- Mock mode for full functionality without API calls (demo/testing)
Challenges we ran into
1. Voice Activity Detection Accuracy Initial AI-based VAD was too slow and resource-intensive. We pivoted to Signal Energy (RMS) approach with ambient noise floor calibration during splash screen — achieving <150ms detection while being battery-efficient.
2. Barge-In Timing Making interruptions feel natural required careful audio pipeline management. We had to cancel ongoing TTS playback, stop the AI mid-thought, and seamlessly transition to listening mode — all within milliseconds.
3. Latency Optimization Achieving conversational flow meant every millisecond counted. We implemented WebSocket streaming instead of REST calls, pre-buffered audio chunks, and optimized the Gemini prompt structure for faster responses.
4. Offline-First with Cloud AI Balancing offline functionality with cloud-dependent AI services required graceful degradation, smart caching of context documents, and clear user feedback when features are limited.
5. Indonesian Language Support Detecting Indonesian filler words ("eung", "anu", "uhm") required custom detection logic since most NLP tools focus on English.
Accomplishments that we're proud of
- Sub-800ms end-to-end latency — From user speech to AI voice response, making conversations feel genuinely natural
- Working Chaos Engine — Successfully simulating real-world disruptions that actually stress-test users
- Voice cloning integration — Users can practice with cloned voices of specific personality types
- Comprehensive analytics — Real-time metrics that provide actionable coaching feedback
- Privacy-first design — All user data stays on-device, API keys encrypted, no cloud sync required
- Bilingual support — Full Indonesian and English localization including filler word detection
- Beautiful UI — Cyber-professional dark theme with the signature golden "Sun" orb visualizer
What we learned
- VAD is harder than it looks — Ambient noise, microphone quality, and speaking styles vary wildly across devices and users
- Latency is everything for voice apps — Even 200ms extra delay breaks the conversational illusion
- State management complexity — Real-time audio + AI + UI animations required careful orchestration to avoid race conditions
- Mobile constraints are real — Memory management, battery optimization, and background audio handling need constant attention
- User feedback loops matter — Haptic feedback and visual indicators are crucial for users to understand system state during voice interactions
What's next for Resonance
Short-term:
- iOS release via TestFlight
- More scenario templates (job interviews, medical consultations, conflict resolution)
- Team/enterprise features for corporate training programs
- Integration with calendar apps for pre-meeting practice sessions
Long-term:
- Multi-party conversations (conference call simulations)
- AR mode with virtual avatar for body language training
- Emotion recognition from voice to provide empathy coaching
- API for third-party training content creators
- Gamification expansion with leagues, challenges, and social features
Vision: We believe everyone deserves access to world-class communication coaching. Resonance aims to democratize what was previously only available through expensive executive coaches — making confident communication accessible to anyone with a smartphone.
Built with ☕ and determination for the future of communication training.
Built With
- android
- eas
- elevenlabs-api
- expo-router
- expo.io
- google-gemini-api
- hermes
- javascript
- lottie
- moti
- nativewind
- react-native
- reanimated
- sqlite
- tailwind-css
- websocket
- zustand
Log in or sign up for Devpost to join the conversation.