WhisperPlay
Real-time “Silent Disco” with Live Language Toggle for Any Stream
Inspiration
Late-night streamers often mute their mics to avoid waking family, making chat the only way to follow the action. Non-English viewers, meanwhile, rely on slow, post-stream captions. We wanted to give every viewer instant, private audio—in any language—without forcing creators to change their setup.
What it does
WhisperPlay is a browser extension + OBS plug-in combo that:
- Captures the creator’s mic audio locally in OBS.
- Transmits it over an encrypted WebRTC data channel directly to viewers’ browsers.
- Lets each viewer toggle between:
• Original mic audio
• Live AI-translated dub (Whisper STT → DeepL → ElevenLabs TTS)
• Muted (classic “silent disco” mode)
All processing happens in < 1.5 s latency, and no raw audio ever touches our servers.
How we built it
| Component | Tech Stack |
|---|---|
| OBS Plug-in | C++ WebRTC module streaming raw PCM 48 kHz |
| Browser Extension | Manifest V3, Web Audio API, WebRTC data-channel |
| Speech-to-Text | Whisper tiny.en via onnxruntime-web |
| Translation | DeepL Free API |
| Text-to-Speech | ElevenLabs streaming API |
| UI | React + Material-UI toggle buttons with language flags |
| Hosting | GitHub Pages for landing page |
Architecture flow
Mic → OBS → WebRTC → Extension → {Original | Translated} → Headphones
Challenges we ran into
- Sub-second TTS: ElevenLabs streaming reduced round-trip from 800 ms to ~450 ms.
- WebRTC over browser extension: Service-workers don’t support WebRTC → moved connection to offscreen document.
- Rate-limit juggling: Added 5-caption cache + graceful fallback to subtitles when APIs throttle.
- Audio drift: Implemented micro-slewing buffer to keep original and translated tracks in sync.
Accomplishments that we're proud of
- End-to-end latency < 1.5 s on a 20 Mbps connection.
- Zero external server reliance—all heavy lifting is client-side.
- Working demo captured in a single take: muted OBS canvas → live Spanish dub in browser.
- Code written entirely during the hackathon—no prior repo reuse.
What we learned
- WebRTC data channels can carry raw PCM if you slice packets to 10 ms frames.
- Whisper
tiny.enon WebGPU is surprisingly fast (< 150 ms). - ElevenLabs supports SSML
<break>tags—perfect for breath sounds in gaming commentary. - OBS plug-in dev docs are sparse; reading the OBS source saved hours.
What's next for WhisperPlay
- Multi-speaker separation – identify game audio vs. creator voice.
- Voice-clone creator consent flow – let streamers upload 60 s sample for personalized TTS.
- LTSC/RTMP ingest – integrate directly with Twitch so viewers don’t need an extension.
- Offline mode – bundle lightweight
espeak-ngWASM for when APIs are unreachable.

Log in or sign up for Devpost to join the conversation.