Skeet: Move from raw footage to a story in seconds.
The Inspiration
Traditional video editing is a war of attrition. To produce a short, compelling video, a creator often sifts through hours of raw footage. This is a manual toll that turns storytelling into a chore.
I wanted to change that. Think of Skeet like "Antigravity" but for video editing. Instead of just being a tool with a few AI features, Skeet acts as a partner to help you plan and build your edit from the ground up.
How We Built It
Skeet is built as a high-performance, full-stack application centered around an Agentic AI Architecture. I moved beyond simple wrappers by building a deep integration between Large Language Models and a high-performance, WebGL-based Non-Linear Editor (NLE).
- Intelligence Layer: I used the Mastra framework to orchestrate Gemini 3 Flash. This allowed me to build an autonomous agent that doesn't just "talk" about video, but actually understands the narrative structure of raw footage. I leveraged Gemini 3's high-level reasoning to translate subjective creative requests into a precise sequence of technical operations.
- The Media Brain (RAG): To help the agent "remember" what is in your footage, I implemented a Retrieval-Augmented Generation pipeline. Using Upstash Vector and Google Gemini Embeddings, I index every clip during post-upload. This turns raw bytes into searchable "Energy Vectors," allowing the AI to query footage for specific lighting, moods, or actions.
- The Studio Interface: The frontend is built with Next.js 15 and Tailwind CSS, using Zustand for complex multi-track timeline management. I implemented a custom rendering bridge that communicates with the backend via Pusher, ensuring that when the AI makes an edit, the change reflects on your screen in milliseconds.
- Browser-Based Engine: To achieve real-time creative flow, I moved playback and compositing logic to the client-side. Using WebGL, I built a custom NLE that handles multi-track rendering, transitions, and metadata overlays directly in the browser, providing instant feedback as the AI modifies the timeline.
Challenges I Ran Into
- Agentic Tool Use: Teaching an LLM to accurately interact with a complex video timeline required strict schema definitions and iterative prompt engineering. I utilized Gemini 3's Thinking capabilities to allow the agent to reason through complex multi-step edits before committing to a final operation.
- Timeline State Awareness: Getting the AI to perform like a professional editor was a struggle of context. The agent needed to maintain a constant, frame-accurate awareness of my custom JSON timeline schema, understanding how layer hierarchies, track types, and temporal offsets interacted in a unified project manifest.
- Deterministic Creativity: Mapping raw creative intent into valid, frame-accurate edit operations required a specialized bridge. I had to ensure the AI's creative "thoughts" always resulted in a schema-compliant state that my WebGL rendering engine could interpret without error, regardless of the complexity of the requested cut.
Accomplishments That I’m Proud Of
- The AI "Editor-in-the-Loop": I’m incredibly proud of the seamless interaction between the user and the AI. Getting Gemini 3 to act not just as a chatbot, but as a proactive editor that can modify a complex timeline based on "vibe" requests, was a major technical milestone for me.
- Multimodal Search & Retrieval: Building the RAG pipeline that allows you to search through footage by action or mood. Having the AI "watch" hours of footage and then accurately retrieve a precise segment based on a natural language query feels like the future of content creation.
- Real-Time WebGL Compositing: Implementing the NLE to run entirely in the browser using WebGL. This choice ensures that the AI’s complex compositing decisions are rendered with zero latency, providing a truly interactive creative flow.
- Architectural Flow: Successfully building a unified, production-grade bridge between the marketing vision and the Studio application. Managing the entire stack solo—from infrastructure to AI agent orchestration—while maintaining a high level of Polish is something I’m very happy with.
What I Learned
- The Future is Agentic: Building Skeet taught me that the true power of GenAI isn't in chat interfaces, but in autonomous tool use. By leveraging Gemini 3's high-level reasoning and custom "Thinking" configurations, I learned how to build a system that can bridge the gap between abstract creative intent and deterministic code.
- Mastering Multimodal Context: I gained a deep understanding of how to manage large-scale media data within an AI context. Learning to effectively index video segments into a vector store and then retrieving them for an LLM to "watch" was a challenging but rewarding technical hurdle.
- The Performance of WebGL: Building a video compositor from scratch taught me a lot about the performance characteristics of the modern web. Managing textures, shaders, and framebuffers in WebGL 2 allowed me to deliver a desktop-grade editing experience directly in the browser.
- Full-Stack Orchestration: Working on this solo required me to master the entire stack, from infrastructure and database design to frontend UX and AI agent design. I learned how to move fast without sacrificing the polish needed for a production-ready application.
What’s Next for Skeet
- Advanced Audio Orchestration: I plan to integrate deeper audio analysis, allowing the AI to automatically sync cuts to the beat of a track or generate AI voiceovers that match the mood of the edit.
- Collaborative Agentic Editing: I want to explore how multiple users can interact with the same agentic timeline, allowing the AI to act as a bridge between a director and a producer in real-time.
- Direct-to-Social Export: Building a specialized pipeline for direct export and scheduling to platforms like YouTube and Instagram, including AI-generated captions and optimized aspect ratio conforms.
- Extended Plugin System: I aim to build a plugin architecture for the WebGL compositor. This would allow other developers to create custom shaders and effects that the AI agent can learn to use for more creative variety.
LAST MINUTE: VIDEO AT :https://www.loom.com/share/f0356e9d58674efe93c2fcb16a10778b
Log in or sign up for Devpost to join the conversation.