WhisperPlay

Real-time “Silent Disco” with Live Language Toggle for Any Stream


Inspiration

Late-night streamers often mute their mics to avoid waking family, making chat the only way to follow the action. Non-English viewers, meanwhile, rely on slow, post-stream captions. We wanted to give every viewer instant, private audio—in any language—without forcing creators to change their setup.


What it does

WhisperPlay is a browser extension + OBS plug-in combo that:

  1. Captures the creator’s mic audio locally in OBS.
  2. Transmits it over an encrypted WebRTC data channel directly to viewers’ browsers.
  3. Lets each viewer toggle between:
    • Original mic audio
    • Live AI-translated dub (Whisper STT → DeepL → ElevenLabs TTS)
    • Muted (classic “silent disco” mode)

All processing happens in < 1.5 s latency, and no raw audio ever touches our servers.


How we built it

Component Tech Stack
OBS Plug-in C++ WebRTC module streaming raw PCM 48 kHz
Browser Extension Manifest V3, Web Audio API, WebRTC data-channel
Speech-to-Text Whisper tiny.en via onnxruntime-web
Translation DeepL Free API
Text-to-Speech ElevenLabs streaming API
UI React + Material-UI toggle buttons with language flags
Hosting GitHub Pages for landing page

Architecture flow

Mic → OBS → WebRTC → Extension → {Original | Translated} → Headphones


Challenges we ran into

  • Sub-second TTS: ElevenLabs streaming reduced round-trip from 800 ms to ~450 ms.
  • WebRTC over browser extension: Service-workers don’t support WebRTC → moved connection to offscreen document.
  • Rate-limit juggling: Added 5-caption cache + graceful fallback to subtitles when APIs throttle.
  • Audio drift: Implemented micro-slewing buffer to keep original and translated tracks in sync.

Accomplishments that we're proud of

  • End-to-end latency < 1.5 s on a 20 Mbps connection.
  • Zero external server reliance—all heavy lifting is client-side.
  • Working demo captured in a single take: muted OBS canvas → live Spanish dub in browser.
  • Code written entirely during the hackathon—no prior repo reuse.

What we learned

  • WebRTC data channels can carry raw PCM if you slice packets to 10 ms frames.
  • Whisper tiny.en on WebGPU is surprisingly fast (< 150 ms).
  • ElevenLabs supports SSML <break> tags—perfect for breath sounds in gaming commentary.
  • OBS plug-in dev docs are sparse; reading the OBS source saved hours.

What's next for WhisperPlay

  • Multi-speaker separation – identify game audio vs. creator voice.
  • Voice-clone creator consent flow – let streamers upload 60 s sample for personalized TTS.
  • LTSC/RTMP ingest – integrate directly with Twitch so viewers don’t need an extension.
  • Offline mode – bundle lightweight espeak-ng WASM for when APIs are unreachable.

Built With

  • c++
  • chrome-extention
  • javascrpit
  • obs-studio
  • openai
  • webrtc
  • whisper
Share this project:

Updates