Inspiration

I'm building a creator brand ("aycee") where my life plays out like a TV show. The problem: when you're solo streaming, there's no director calling the beats, hyping moments, or framing chaos in real time. So I built one.

What it does

Stream Director listens to a streamer's mic live and decides, in real time, whether to flash an on-screen director cue such as a quick reaction, a big "episode" title card for scene changes, or a "while you were away" catch-up summary after an ad break. It defaults to silence and only fires when something genuinely notable happens, so cues feel earned instead of spammy.

How I built it

  • Deepgram streams live speech-to-text from the mic
  • A rolling 60-second transcript buffer feeds into Claude every 15 seconds
  • Claude decides between three response types: a quick REACTION cue, a bigger SCENE/episode title card, or (after a simulated ad break) a CATCHUP summary
  • A lightweight Express server exposes the latest cue via a polling endpoint
  • A branded HTML/CSS overlay (designed for OBS browser sources) animates the cues in

Challenges I ran into

The hardest part was tuning the AI to know when to stay silent. A naive version fires a cue on every beat, which is noisy and useless. I built in a hard cooldown plus an explicit "default to silence" instruction so the system only reacts to things that actually matter, which took several rounds of live testing and prompt tuning to get right.

Accomplishments that I'm proud of

Getting the AI to actually know when to stay quiet. Most reactive AI demos fire constantly and feel gimmicky. Getting Stream Director to sit silent through minutes of filler talk and only fire when something genuinely happened took real iteration, and seeing it correctly catch a topic change or a joke landing live, on the first real test, felt like a genuine "it works" moment.

I'm also proud that I built three distinct, working cue types (REACTION, SCENE, CATCHUP) solo in one weekend, each requiring its own reasoning logic and visual treatment, and got the entire pipeline — mic to transcript to AI decision to live overlay — running end-to-end, multiple times, with no manual triggers.

What I learned

The technical integration (Deepgram, Claude, Express) was the easy part. The real lesson was that prompting an LLM to make a judgment call, deciding when NOT to act, is a fundamentally different and harder problem than prompting it to generate content. Cooldowns, explicit "default to silence" instructions, and a lot of live testing were what actually made the system feel intelligent rather than noisy.

I also learned a lot about real-time audio pipelines — buffering live transcripts, managing polling-based UI updates, and debugging issues that only show up when multiple async pieces (mic input, API calls, browser rendering) are running simultaneously.

What's next for Stream Director

Hooking into real Twitch chat and ad-break APIs instead of the simulated versions used in this demo, and running it live on my own channel.

Built With

Share this project:

Updates