GooseTake.fm

reddit post selection
watch the show!

Inspiration

UWaterloo Reddit is a goldmine of unhinged takes — students venting about OSAP cuts, housing, co-op stress, and academic survival. We thought: what if the most chaotic minds on the planet weighed in? Trump's rally energy, Elon's robotic rationalism, and Gordon Ramsay's volcanic fury — all arguing about whether your midterm extension request is a disaster or the greatest thing ever done.

What it does

GooseTakes.fm scrapes r/uwaterloo for hot posts, lets you pick a topic (or paste any Reddit URL), and generates a live AI debate between Trump, Elon Musk, and Gordon Ramsay. Each line is synthesized with cloned celebrity voices via Fish Audio, played back in order with looping video of each speaker. The whole flow — from topic selection to playing debate — runs in under 10 seconds.

How we built it

Frontend: React + Vite + TypeScript + Tailwind CSS
Backend: FastAPI + Python, served with uvicorn
Script generation: Claude reads the actual post body and top comments, then writes a structured debate script with strict per-character voice rules
Voice synthesis: Fish Audio TTS API with per-speaker emotion tags and prosody tuning (speed, volume)
Reddit: Public JSON API — no auth, no rate limits, just r/uwaterloo/hot.json
Video playback: Three looping muted videos, swapped on each audio clip's play event — no timestamp math needed

Challenges we ran into

Fish Audio rate limits — fully concurrent synthesis hit 429s immediately. Solved with an asyncio semaphore capped at 3 concurrent requests.
Voice consistency — emotion tags like (hysterical) caused voice bleed on the first TTS chunk. Had to experiment with tag combinations per speaker and learn which tags play nicely with each voice model.
Prompt engineering for character voice — getting Claude to write lines that actually sound like Trump vs. Elon vs. Gordon (and not just generic aggressive/calm/angry) took significant iteration. The key was specificity: referencing actual post details, hard word limits per character, and explicit anti-repetition rules.
Natural script endings — fixing the line count to 9 made every debate feel forced. Switching to 8–12 with "end when it lands" instruction made a huge difference.

Accomplishments that we're proud of

The full pipeline — Reddit scrape → Claude script → Fish Audio synthesis → synced video playback — works end to end in one click
Character voices are genuinely distinct and funny, not just "person talking"
The debate references actual post specifics (dollar amounts, names, situations) rather than vibing on the general topic, which makes it feel surprisingly grounded
Clean UI that doesn't get in the way of the demo

What we learned

Prompt engineering is 80% of the work when personality is the product
TTS emotion tags are powerful but fragile — the wrong tag on the wrong model breaks the voice entirely
Working with the voice model's natural delivery (rather than forcing it into unnatural emotion) produces much better results
Sequential vs. concurrent API calls is a real architectural decision, not just a performance detail