Inspiration
The "intro" is the most important part of any video, yet it’s the hardest to get right. We noticed that creators and professionals spend hours fiddling with keyframes in After Effects or settling for cheesy, overused templates. We wanted to build a "magic wand" for video production—a tool where the distance between an idea and a high-fidelity presentation video is measured in seconds, not hours. Introo was born to democratize high-end motion graphics for everyone.
What it does
Introo is an AI-native workspace that transforms simple text prompts or slide decks into cinematic presentation videos.
Prompt-to-Video: Users describe the vibe, and Introo generates the script, high-fidelity visuals (via Veo), and a matching voiceover.
Dynamic Brand Sync: Upload a logo, and Introo automatically adapts the entire video's color theory and typography to match your brand identity.
Multimodal Refinement: Unlike rigid editors, users can "talk" to their video to change the tone, swap background tracks, or adjust the pacing.
How we built it
We built Introo using a cutting-edge multimodal stack:
The Engine: We leveraged the Google GenAI SDK, specifically using Veo 3.1 for generating the cinematic video base layers.
The Voice: For the narration, we utilized Gemini 2.5 Flash (TTS), allowing us to use "steerable" audio prompts to match the voiceover's emotion to the video's energy.
Frontend: A sleek, reactive interface built with Next.js 15 and Tailwind CSS, featuring a custom-built timeline component for real-time previews.
Orchestration: We used a Python FastAPI backend to manage the long-running operations (LROs) required for high-resolution video rendering, ensuring a smooth user experience even during heavy processing.
Challenges we ran into
The biggest hurdle was state management and latency. Generating high-quality video (Veo) and synchronized audio (Gemini TTS) happens asynchronously. Orchestrating these so the user isn't staring at a spinning loader—and ensuring the audio perfectly aligns with the visual transitions—required us to build a custom "sync-buffer" logic on the frontend. We also wrestled with the new @google/genai SDK transition, which required us to rethink how we handled long-running operations.
Accomplishments that we're proud of
We are incredibly proud of our "One-Click Vibe Shift." We successfully implemented a feature where the AI doesn't just change a filter, but completely regenerates the visual and auditory assets to match a new style (e.g., from "Corporate Professional" to "Cyberpunk Tech") while keeping the core message intact. Seeing a 30-second video render with professional-grade transitions in under a minute felt like magic.
What we learned
This hackathon was a deep dive into the Future of Generative Video. We learned that the "Human-in-the-loop" model is superior to "Pure AI" generation; users don't want the AI to do everything, they want the AI to do the heavy lifting while they keep creative control. We also mastered the art of polling asynchronous APIs and handling multimodal data streams.
What's next for Introo
We’re just scratching the surface. The roadmap for Introo includes:
Direct Integration: Plugins for Google Slides and Canva so you can turn decks into videos without leaving your workflow.
Real-time Translation: Using Gemini’s multimodal capabilities to "dub" the presentation into 50+ languages while maintaining the original speaker's tone.
Interactive Videos: Adding clickable "hotspots" within the AI-generated video for e-commerce and lead generation.
Built With
- next
- react
Log in or sign up for Devpost to join the conversation.