🎬 Frames - Hackathon Submission

Inspiration

We noticed that AI chatbots like ChatGPT can answer almost any question, but text responses have limitations. Visual learners struggle with text-only explanations, and complex topics often need step-by-step visual demonstrations.

What if every AI answer could instantly become a video tutorial?

The idea came from watching students and learners:

Struggle with text-heavy explanations
Search YouTube for tutorials that may not match their specific question
Need personalized visual explanations, not generic videos

We wanted to bridge the gap between AI text responses and visual learning by creating an instant video generation tool that turns any answer into an engaging, narrated explanation.

What it does

Frames is ChatGPT that makes videos.

Users can:

Ask any question (math, coding, science, history, etc.)
Get an AI response from Google Gemini
Click "Generate Video Solution" on any answer
Watch a 2-3 minute narrated video explaining the answer

Key Features:

🤖 Intelligent Chat Interface - Powered by Google Gemini
🎥 One-Click Video Generation - Transform any answer into a video
📝 Automatic Script Creation - AI breaks down answers into structured scenes
🎨 Beautiful UI - Modern, ChatGPT-inspired interface with dark mode
⚡ Real-time Feedback - Clear loading states during generation

Example Use Cases:

Student asks: "How does binary search work?" → Gets video showing the algorithm step-by-step
Developer asks: "What's the difference between async and await?" → Gets visual explanation with code examples
Anyone asks: "Explain photosynthesis" → Gets animated tutorial showing the process

How we built it

Architecture:

Frontend:

Next.js 16 with App Router for modern React architecture
TypeScript for type safety
Tailwind CSS for beautiful, responsive UI
Custom hooks for dark mode and state management

Backend (API Routes):

Next.js API Routes - Serverless endpoints for chat and video generation
Google Gemini API - Powers the chat interface
Google Veo 3.1 - Latest video generation model for creating videos

AI Pipeline:

Script Generation: Gemini analyzes the answer and creates structured video script with:
- Scene breakdowns
- Visual descriptions
- Narration text
- Timing information
Video Generation: Veo 3.1 takes the script and generates:
- Video with matching visuals
- Built-in narration/audio
- Synchronized scenes

Tech Stack:

Frontend: Next.js 16, React 19, TypeScript, Tailwind CSS
AI: Google Gemini API, Veo 3.1, LangChain
Development: ESLint, Prisma (ready for database)

Development Process:

Started with chat interface integration (Gemini API)
Built script generation system (text → structured video script)
Integrated Veo 3.1 video generation (cutting-edge, just released!)
Created seamless UI for video playback
Polished with dark mode and error handling

Challenges we ran into

1. Veo API Integration 🎥

Challenge: Veo 3.1 is brand new with limited JavaScript/TypeScript documentation. The Python SDK documentation didn't translate directly.

Solution:

Found the correct @google/genai package (different from @google/generative-ai)
Discovered the correct API pattern through experimentation
Implemented proper async polling for video generation completion
Built robust error handling for API edge cases

2. Video Download & Authentication 🔐

Challenge: Veo returns video file URLs that require API key authentication, but the SDK download method wasn't working as expected.

Solution:

Implemented fallback to direct URL fetch with API key
Added proper authentication headers
Built error detection for failed downloads
Created comprehensive logging for debugging

3. Prompt Engineering for Video Quality 📝

Challenge: Getting Veo to generate educational videos, not generic visuals. Needed to structure prompts to include both visual descriptions AND narration.

Solution:

Created structured script format with explicit narration text
Combined visual descriptions with narration in video prompts
Iterated on prompt format to improve video relevance

4. Long Generation Times ⏱️

Challenge: Video generation takes 60-90 seconds. Users need clear feedback during this wait.

Solution:

Implemented polling with status updates
Added clear loading states ("Generating Video Script...", "Waiting for video generation...")
Created user-friendly error messages

5. TypeScript & SDK Types 🔧

Challenge: Video generation APIs weren't fully typed in TypeScript, requiring careful type handling.

Solution:

Used type assertions where needed
Built robust type checking and error handling
Created flexible response parsers for different API response formats

Accomplishments that we're proud of

✨ Successfully integrated cutting-edge Veo 3.1 API - This is brand new technology and we got it working!

🎯 Created seamless user experience - From question to video in one click, with clear feedback throughout

🧠 Built intelligent script generation - Our AI doesn't just add visuals randomly; it creates structured, educational content

⚡ Delivered working MVP in tight timeframe - Full chat + video generation pipeline working end-to-end

🎨 Polished UI - Beautiful, modern interface that rivals ChatGPT's design

📚 Clear documentation - Comprehensive README and project story for judges and future developers

🔧 Robust error handling - Graceful failures with user-friendly messages

What we learned

Technical Learnings:

Video generation APIs require different approaches than text/image APIs - polling, async operations, file handling
New SDKs need careful documentation reading and experimentation - not everything works as expected from examples
API authentication can vary significantly between endpoints - need flexible authentication strategies
Prompt engineering for video is different from text - need both visual AND audio context

Process Learnings:

Start simple, iterate - Built chat first, then added video generation
Test early and often - Caught API issues before they became blockers
Documentation matters - Created README and story while building, not after
User feedback is critical - Loading states and errors make or break the experience

AI/ML Learnings:

Multimodal AI (text → video) requires orchestration of multiple models
Script generation is crucial - garbage in = garbage out for video generation
Structured prompts produce better results than free-form descriptions

What's next for Frames

Short-term (Post-Hackathon):

S3 Integration - Upload videos to cloud storage instead of base64 for scalability
Better Error Handling - More user-friendly error messages and recovery options
Progress Indicators - Real-time progress updates during video generation (Socket.IO)
Video Quality Options - Different video styles (cartoon, realistic, whiteboard)

Medium-term:

Video Library - Save and organize generated videos with search/filter
Subject Templates - Pre-configured prompts for math, coding, science
Export Options - Download videos, share links, embed codes
Database Integration - Save chat history and user preferences

Long-term Vision:

Multi-model Support - Switch between GPT-4, Claude, Gemini for different use cases
Collaboration Features - Share videos, create playlists, team workspaces
Educational Platform - Turn into a full learning platform with courses
Mobile App - Native iOS/Android apps for on-the-go learning
AI Tutor Mode - Proactive video suggestions based on learning gaps
Analytics Dashboard - Track learning progress, topics covered, time spent

Potential Integrations:

LMS Platforms (Canvas, Blackboard) - Generate course content
Notion/Obsidian - Embed videos in notes
Discord/Slack Bots - Video generation in chat
Browser Extension - Generate videos from any webpage

Demo Highlights

🎬 Try asking:

"How does binary search work?"
"Explain the water cycle"
"What's recursion in programming?"
"How do solar panels work?"

Then click "Generate Video Solution" and watch the magic happen!

Team & Acknowledgments

Built with passion and dedication during [Hackathon Name] 🚀

Special thanks to:

Google for Gemini and Veo 3.1 APIs
Next.js team for an amazing framework
The open-source community for tools and inspiration

Frames - Transforming AI Answers into Visual Learning Experiences 🎬✨

Built With

frontend:-next.js-16
langchain-development:-eslint
prisma
react-19
tailwind-css-ai:-google-gemini-api
typescript
veo-3.1

Updates

Pratham Nanekar started this project — Nov 02, 2025 12:06 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.