🎬 Frames - Hackathon Submission
Inspiration
We noticed that AI chatbots like ChatGPT can answer almost any question, but text responses have limitations. Visual learners struggle with text-only explanations, and complex topics often need step-by-step visual demonstrations.
What if every AI answer could instantly become a video tutorial?
The idea came from watching students and learners:
- Struggle with text-heavy explanations
- Search YouTube for tutorials that may not match their specific question
- Need personalized visual explanations, not generic videos
We wanted to bridge the gap between AI text responses and visual learning by creating an instant video generation tool that turns any answer into an engaging, narrated explanation.
What it does
Frames is ChatGPT that makes videos.
Users can:
- Ask any question (math, coding, science, history, etc.)
- Get an AI response from Google Gemini
- Click "Generate Video Solution" on any answer
- Watch a 2-3 minute narrated video explaining the answer
Key Features:
- 🤖 Intelligent Chat Interface - Powered by Google Gemini
- 🎥 One-Click Video Generation - Transform any answer into a video
- 📝 Automatic Script Creation - AI breaks down answers into structured scenes
- 🎨 Beautiful UI - Modern, ChatGPT-inspired interface with dark mode
- ⚡ Real-time Feedback - Clear loading states during generation
Example Use Cases:
- Student asks: "How does binary search work?" → Gets video showing the algorithm step-by-step
- Developer asks: "What's the difference between async and await?" → Gets visual explanation with code examples
- Anyone asks: "Explain photosynthesis" → Gets animated tutorial showing the process
How we built it
Architecture:
Frontend:
- Next.js 16 with App Router for modern React architecture
- TypeScript for type safety
- Tailwind CSS for beautiful, responsive UI
- Custom hooks for dark mode and state management
Backend (API Routes):
- Next.js API Routes - Serverless endpoints for chat and video generation
- Google Gemini API - Powers the chat interface
- Google Veo 3.1 - Latest video generation model for creating videos
AI Pipeline:
Script Generation: Gemini analyzes the answer and creates structured video script with:
- Scene breakdowns
- Visual descriptions
- Narration text
- Timing information
Video Generation: Veo 3.1 takes the script and generates:
- Video with matching visuals
- Built-in narration/audio
- Synchronized scenes
Tech Stack:
Frontend: Next.js 16, React 19, TypeScript, Tailwind CSS
AI: Google Gemini API, Veo 3.1, LangChain
Development: ESLint, Prisma (ready for database)
Development Process:
- Started with chat interface integration (Gemini API)
- Built script generation system (text → structured video script)
- Integrated Veo 3.1 video generation (cutting-edge, just released!)
- Created seamless UI for video playback
- Polished with dark mode and error handling
Challenges we ran into
1. Veo API Integration 🎥
Challenge: Veo 3.1 is brand new with limited JavaScript/TypeScript documentation. The Python SDK documentation didn't translate directly.
Solution:
- Found the correct
@google/genaipackage (different from@google/generative-ai) - Discovered the correct API pattern through experimentation
- Implemented proper async polling for video generation completion
- Built robust error handling for API edge cases
2. Video Download & Authentication 🔐
Challenge: Veo returns video file URLs that require API key authentication, but the SDK download method wasn't working as expected.
Solution:
- Implemented fallback to direct URL fetch with API key
- Added proper authentication headers
- Built error detection for failed downloads
- Created comprehensive logging for debugging
3. Prompt Engineering for Video Quality 📝
Challenge: Getting Veo to generate educational videos, not generic visuals. Needed to structure prompts to include both visual descriptions AND narration.
Solution:
- Created structured script format with explicit narration text
- Combined visual descriptions with narration in video prompts
- Iterated on prompt format to improve video relevance
4. Long Generation Times ⏱️
Challenge: Video generation takes 60-90 seconds. Users need clear feedback during this wait.
Solution:
- Implemented polling with status updates
- Added clear loading states ("Generating Video Script...", "Waiting for video generation...")
- Created user-friendly error messages
5. TypeScript & SDK Types 🔧
Challenge: Video generation APIs weren't fully typed in TypeScript, requiring careful type handling.
Solution:
- Used type assertions where needed
- Built robust type checking and error handling
- Created flexible response parsers for different API response formats
Accomplishments that we're proud of
✨ Successfully integrated cutting-edge Veo 3.1 API - This is brand new technology and we got it working!
🎯 Created seamless user experience - From question to video in one click, with clear feedback throughout
🧠 Built intelligent script generation - Our AI doesn't just add visuals randomly; it creates structured, educational content
⚡ Delivered working MVP in tight timeframe - Full chat + video generation pipeline working end-to-end
🎨 Polished UI - Beautiful, modern interface that rivals ChatGPT's design
📚 Clear documentation - Comprehensive README and project story for judges and future developers
🔧 Robust error handling - Graceful failures with user-friendly messages
What we learned
Technical Learnings:
- Video generation APIs require different approaches than text/image APIs - polling, async operations, file handling
- New SDKs need careful documentation reading and experimentation - not everything works as expected from examples
- API authentication can vary significantly between endpoints - need flexible authentication strategies
- Prompt engineering for video is different from text - need both visual AND audio context
Process Learnings:
- Start simple, iterate - Built chat first, then added video generation
- Test early and often - Caught API issues before they became blockers
- Documentation matters - Created README and story while building, not after
- User feedback is critical - Loading states and errors make or break the experience
AI/ML Learnings:
- Multimodal AI (text → video) requires orchestration of multiple models
- Script generation is crucial - garbage in = garbage out for video generation
- Structured prompts produce better results than free-form descriptions
What's next for Frames
Short-term (Post-Hackathon):
- S3 Integration - Upload videos to cloud storage instead of base64 for scalability
- Better Error Handling - More user-friendly error messages and recovery options
- Progress Indicators - Real-time progress updates during video generation (Socket.IO)
- Video Quality Options - Different video styles (cartoon, realistic, whiteboard)
Medium-term:
- Video Library - Save and organize generated videos with search/filter
- Subject Templates - Pre-configured prompts for math, coding, science
- Export Options - Download videos, share links, embed codes
- Database Integration - Save chat history and user preferences
Long-term Vision:
- Multi-model Support - Switch between GPT-4, Claude, Gemini for different use cases
- Collaboration Features - Share videos, create playlists, team workspaces
- Educational Platform - Turn into a full learning platform with courses
- Mobile App - Native iOS/Android apps for on-the-go learning
- AI Tutor Mode - Proactive video suggestions based on learning gaps
- Analytics Dashboard - Track learning progress, topics covered, time spent
Potential Integrations:
- LMS Platforms (Canvas, Blackboard) - Generate course content
- Notion/Obsidian - Embed videos in notes
- Discord/Slack Bots - Video generation in chat
- Browser Extension - Generate videos from any webpage
Demo Highlights
🎬 Try asking:
- "How does binary search work?"
- "Explain the water cycle"
- "What's recursion in programming?"
- "How do solar panels work?"
Then click "Generate Video Solution" and watch the magic happen!
Team & Acknowledgments
Built with passion and dedication during [Hackathon Name] 🚀
Special thanks to:
- Google for Gemini and Veo 3.1 APIs
- Next.js team for an amazing framework
- The open-source community for tools and inspiration
Frames - Transforming AI Answers into Visual Learning Experiences 🎬✨
Built With
- frontend:-next.js-16
- langchain-development:-eslint
- prisma
- react-19
- tailwind-css-ai:-google-gemini-api
- typescript
- veo-3.1
Log in or sign up for Devpost to join the conversation.