Inspiration

I've heard people joke about it in stand-up sets and complain about it in movies — the senior executive who spent their whole weekend turning numbers into a deck. It always gets a laugh because everyone in the room has lived it. That stuck with me. The insight is always already there in the data, the bottleneck is just the translation. I wanted to build something where you could talk about what you found, the way you'd explain it to a colleague, and have it come out the other side as a proper boardroom presentation. Voice carries context and emphasis that a spreadsheet never will. That gap is what CaseStudy Forge was built to close.

What it does

CaseStudy Forge lets you upload a CSV, speak your findings aloud via the Gemini Live API, and get a full boardroom-ready report package in minutes. The AI reads your data against what you said, structures the story using the SCQA framework, generates animated charts, records a cinematic video with ElevenLabs narration, and bundles everything into a ZIP containing an MP4, DOCX, PPTX, interactive HTML, and MP3 summary — all from one voice-driven session.

How I built it ( more like "directed it" since i did use vibe coding)

The backend is FastAPI on Python 3.12, using the Gemini Live API for real-time voice transcription and SCQA narrative generation. Charts are built with Matplotlib and Plotly, then converted to live Chart.js animations. Each slide is written as animated HTML and recorded in real-time using Playwright's Chromium screen recording — no frame-by-frame Python computation. ElevenLabs Turbo v2.5 handles narration with gTTS as a fallback. ffmpeg handles audio sync via adelay mux and final H.264 output. Documents are generated via python-docx, python-pptx, and Jinja2.

Challenges I ran into

The hardest one was Windows blocking Playwright's subprocess spawning inside uvicorn's event loop — I had to isolate Playwright in a separate ProactorEventLoop thread to get around it. Before that, the video renderer was taking 20+ minutes because of per-frame Ken Burns computation in moviepy — I scrapped that entirely and replaced it with Playwright screen recording which runs at realtime. Gemini's output was also frequently malformed JSON, which took a while to make robust. And ElevenLabs free tier limits meant I had to build a full gTTS fallback so the pipeline could run through the demo without burning the quota early.

Accomplishments that I'm proud of

The video renderer is the thing I'm most proud of. It started as a static PNG slideshow with a 20-minute render time and became a fully animated Chart.js presentation — growing bars, drawing pie charts, counting numbers, smooth CSS transitions — recording in realtime. The full pipeline, voice in and MP4 plus DOCX plus PPTX plus HTML plus MP3 out, runs end to end in under 10 minutes on a normal laptop. And the SCQA structure makes the AI output feel like something a real analyst wrote, not just a summary.

What I learned

I learned that voice input genuinely changes the quality of AI analysis — the context and emphasis you speak into a prompt is information that column headers can never carry. I also learned that Playwright is a surprisingly powerful tool for programmatic video, and that GPU-accelerated CSS recorded at playtime beats any Python rendering library for smoothness. But the biggest thing I took away is that multimodal AI holds a lot of potential that I've only just begun to realise. Combining voice, data, and generated visuals in one pipeline made it click — the applications feel endless, inside and outside of offices.

What's next for CaseStudy Forge

The next thing I want to build is a proper Remotion-based video renderer for real motion graphics — animated chart axes, data callouts flying in, branded intros.

After that, a multi-CSV comparison mode so you can put two datasets side by side in the same report. The bigger dream is a collaborative layer where a whole room of people can annotate the voice session before generation, so the final report reflects what everyone agreed on, not just what one person remembered to say.

Built With

audioworklet
chromium
elevenlabs-turbo
fastapi
ffmpeg
gemini-flash-api
gemini-live-api
gtts
html
javascript
jinja
matplotlib
numpy
pandas
pillow
playywright
plotllyl
python
python-docx
python-pptx
uvicorn
vanilla-js
web-audio-api

Updates

Sitanshu Thakur posted an update — Mar 16, 2026 06:00 PM EDT

This is my first hackathon, i was so nervous since its basically just 2 hours before submission closes while i was making the video, editing it, doing all the last minute work haha it felt annoying but fun !

Log in or sign up for Devpost to join the conversation.

Sitanshu Thakur started this project — Mar 16, 2026 05:58 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.