Inspiration

Presentations are boring. The picture-in-video format hasn't changed in years — a tiny webcam feed in the corner while slides fill the screen. We wanted to make presenting fun, expressive, and physical. What if instead of a talking head, you had a talking hand? A puppet that performs alongside your slides, controlled entirely by your gestures, with a musical soundtrack you conduct in real-time.

What it does

PitchPuppet takes a slide deck (PDF) and a "vibe" prompt, then generates an AR sock puppet that overlays your presentation. Your hands become the performer — open your puppet's mouth to trigger notes, move your wrist to control pitch, spread your fingers for volume, and pan across the stereo field. Each hand controls its own independent audio stream. Claude generates the entire audiovisual configuration from a single text prompt: scale, key, tempo, instrument, effects, colors, and bloom.

How we built it

  • MediaPipe Hands for real-time 21-landmark hand tracking via webcam
  • Three.js with UnrealBloomPass for the 3D neon puppet (hemisphere jaw, body, eyes, and a hand-shaped tongue)
  • Tone.js for browser-based audio synthesis with per-hand synths, panners, and shared effects chains
  • Claude API (claude-sonnet-4-6) to convert natural language vibe prompts into structured VibeConfig JSON controlling all audio and visual parameters
  • PDF.js for rendering uploaded slide decks as presentation backgrounds
  • Vite for fast dev iteration during a 5-hour build window

Challenges we ran into

Getting the puppet geometry right was harder than expected — the tongue (a tiny hand shape) needed to flip orientation based on which physical hand was detected, which required threading MediaPipe handedness labels through the entire pipeline. Audio debugging was tricky since Tone.js requires user interaction to start, and mapping hand gestures to musical parameters needed hysteresis thresholds to prevent jittery note triggering. The bloom post-processing effect blurred our slide backgrounds until we learned to disable it selectively when slides are active.

Accomplishments that we're proud of

Two independent audio streams controlled by two hands simultaneously — each hand is its own instrument. The Claude integration that turns a phrase like "underwater cave" into a complete audiovisual experience (minor pentatonic in D, FM synth, reverb + delay + lowpass filter, deep blue palette with drift animation). The sock puppet mouth mechanics that feel genuinely responsive and playful.

What we learned

MediaPipe's hand tracking is remarkably good for real-time creative applications. Mapping physical gestures to audio parameters is an interaction design problem as much as a technical one — we iterated through several control schemes before landing on mouth-open-for-notes and finger-extension-for-volume. Using Claude as a creative configuration engine (not just a chatbot) is a powerful pattern for generative tools.

What's next for PitchPuppet

  • Electron overlay mode — transparent always-on-top window so the puppet appears over any app, not just in a browser tab
  • Multi-user mode — multiple performers conducting together via WebSocket sync
  • Recording and export — capture the puppet performance as a video with audio for sharing
  • More puppet styles — different characters generated from the vibe prompt
  • Gesture vocabulary expansion — pinch for effects, rotation for modulation, clap for transitions between slides

Built With

Share this project:

Updates