WhisperPlay

Real-time “Silent Disco” with Live Language Toggle for Any Stream

Inspiration

Late-night streamers often mute their mics to avoid waking family, making chat the only way to follow the action. Non-English viewers, meanwhile, rely on slow, post-stream captions. We wanted to give every viewer instant, private audio—in any language—without forcing creators to change their setup.

What it does

WhisperPlay is a browser extension + OBS plug-in combo that:

Captures the creator’s mic audio locally in OBS.
Transmits it over an encrypted WebRTC data channel directly to viewers’ browsers.
Lets each viewer toggle between:
• Original mic audio
• Live AI-translated dub (Whisper STT → DeepL → ElevenLabs TTS)
• Muted (classic “silent disco” mode)

All processing happens in < 1.5 s latency, and no raw audio ever touches our servers.

How we built it

Component	Tech Stack
OBS Plug-in	C++ WebRTC module streaming raw PCM 48 kHz
Browser Extension	Manifest V3, Web Audio API, WebRTC data-channel
Speech-to-Text	Whisper `tiny.en` via `onnxruntime-web`
Translation	DeepL Free API
Text-to-Speech	ElevenLabs streaming API
UI	React + Material-UI toggle buttons with language flags
Hosting	GitHub Pages for landing page

Architecture flow

Mic → OBS → WebRTC → Extension → {Original | Translated} → Headphones

Challenges we ran into

Sub-second TTS: ElevenLabs streaming reduced round-trip from 800 ms to ~450 ms.
WebRTC over browser extension: Service-workers don’t support WebRTC → moved connection to offscreen document.
Rate-limit juggling: Added 5-caption cache + graceful fallback to subtitles when APIs throttle.
Audio drift: Implemented micro-slewing buffer to keep original and translated tracks in sync.

Accomplishments that we're proud of

End-to-end latency < 1.5 s on a 20 Mbps connection.
Zero external server reliance—all heavy lifting is client-side.
Working demo captured in a single take: muted OBS canvas → live Spanish dub in browser.
Code written entirely during the hackathon—no prior repo reuse.

What we learned

WebRTC data channels can carry raw PCM if you slice packets to 10 ms frames.
Whisper tiny.en on WebGPU is surprisingly fast (< 150 ms).
ElevenLabs supports SSML <break> tags—perfect for breath sounds in gaming commentary.
OBS plug-in dev docs are sparse; reading the OBS source saved hours.

What's next for WhisperPlay

Multi-speaker separation – identify game audio vs. creator voice.
Voice-clone creator consent flow – let streamers upload 60 s sample for personalized TTS.
LTSC/RTMP ingest – integrate directly with Twitch so viewers don’t need an extension.
Offline mode – bundle lightweight espeak-ng WASM for when APIs are unreachable.

Built With

c++
chrome-extention
javascrpit
obs-studio
openai
webrtc
whisper

Updates

ARPIT SINGH started this project — Jul 13, 2025 09:26 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.