Inspiration
Academic research papers are crucial but often inaccessible due to their dense, technical language. Inspired by science communication channels like Kurzgesagt and 3Blue1Brown, we saw an opportunity to use Google's Gemini and Veo APIs to democratize science communication by automatically transforming academic papers into engaging video abstracts in various styles.
What it does
SciVid.AI transforms abstract, sophisticated academic papers into engaging 1-minute videos. Our platform bridges the gap between dense scholarly text and visual storytelling and offers users four distinct cinematic directions to match their audience:
- Cinematic: High-production value with dramatic visuals for broad appeal.
- Academic: Clean, diagram-heavy, and professional—ideal for research presentations.
- Anime: Vibrant, stylized animations that turn complex concepts into a visual narrative.
- Minimalist: Focuses on sleek typography and simple geometry for maximum clarity.
How we built it
SciVid employs a pipeline composed of 4 steps:
- Analyzes the paper using Gemini AI to extract key insights and create a detailed video script in json format
- Creates anchor images using Gemini 3 Pro based on the video script to assist the video generation process
- Generate a 1-minute video using Veo AI based on the video scripts and the anchor images
Challenges we ran into
We used Google Veo AI for our video generation, which has a maximum limit of 8 seconds per clip. To produce a cohesive 1-minute video, we couldn't just "generate" the whole thing at once. We had to adapt our prompt to generate individual segments and make sure that they transition smoothly.
We spent considerable time auditing different image generation models. While tools like ImageFX produced beautiful art, they often struggled with the precision required for scientific diagrams. After rigorous testing, we realized that Gemini 3 Pro was the superior choice. Its multimodal reasoning and high-resolution output (up to 4K) allowed us to generate visuals that were not only aesthetic but technically grounded in the paper's data.
Scientific visual storytelling" is a difficult niche for AI. We spent countless hours polishing and "chaining" prompts to move the AI away from generic stock-photo vibes and toward the specific visual styles (Cinematic, Academic, Anime, Minimalist) we promised. Finding the right balance of technical keywords and stylistic descriptors was a major time investment that ultimately defined the quality of our output.
Accomplishments that we're proud of
- Beautiful, intuitive UI with smooth animations and clean design
- Multiple video styles optimized for different research types
What we learned
- Precious experiences with prompt engineering
What's next for SciVid.AI
- Testing different AI models for the video and image generation process
- Address occasional logical issues present in the video
Log in or sign up for Devpost to join the conversation.