Inspiration

We've tried building apps that analyzed YouTube videos but quickly realized it's impossible. Video isn't programmable like every other medium. Anyone can query databases, search documents, but legit nothing for videos, a black box. So we're building the API we wished existed.

Right now, videos are 80% of internet traffic. YouTube alone, 500 hours uploaded every minute. But if you're a developer trying to build an app that understands what's inside those videos?

What it does

Turns any video into structured data via API. Send frame a YouTube URL or an upload, get back JSON with transcript, products mentioned, topics, key moments—everything inside the video as queryable data. Again, Stripe for video intelligence.

How we built it

Python/FastAPI backend. Overshoot and a HuggingFace model (SmolVLM2) extracts frames and audio. WoodWide for analytics/metrics and decision recommendations. YouTube postings for transcription. Orchestrate everything into one API call. Supabase PostgreSQL + Redis for caching. Deployed on Modal (High-performance AI infrastructure)

Challenges we ran into

Processing was too slow (4 min → 72 sec). Costs were too high ($1.20 → $0.08 per video). Schema design took 8 iterations. Solved everything through caching, parallel processing, and multi-model validation.

Accomplishments that we're proud of

Processed 2,347 YouTube videos with 100% accuracy through overshoot and woodwide. Got 64 people on the waitlist before launch.

Shipped fast, talked to customers, and iterated quickly.

Although we want to emphasize our greatest accomplishment was just talking to developers before writing code. Speed matters more than perfection, especially during a hackathon. We noticed that people will pay for abstraction; nobody wants to chain together 5 APIs themselves.

What's next for frame

Product validation, figuring out if devs would want analytics, what type of analytics, API's, or an MCP for mp4 videos

Built With

Share this project:

Updates