Inspiration- Our project was inspired by the profound "Lack of Control" found in current one-shot text-to-video prompts, where specific spatial instructions are often ignored. We observed a "Black Box" issue where creators must type a prompt and pray for a usable result, often starting over from scratch if the output fails. Just as agentic tools like Cursor have revolutionized coding by providing real-time, granular assistance, we wanted to build a solution that moves beyond static storyboards—which fail to capture motion or time—to give animators true "agentic" power over their craft.

What it does- Krafity.ai is a Canvas-to-Video platform that combines the precision of a whiteboard with the power of Generative AI. Instead of just prompting, users "direct" the AI by drawing arrows, sketching layouts, and placing assets on an Infinite Canvas to dictate exact motion. The tool generates video in five-second segments to ensure frame-by-frame continuity, meaning the end of Shot A automatically becomes the start of Shot B. It also supports Branching Storylines, allowing creators to fork a scene and generate two versions side-by-side to see which works best.

How we built it- We built the project on Google Vertex AI using a sophisticated multi-model generation pipeline. The frontend utilizes React and the Tldraw engine for the canvas, while the backend is powered by FastAPI, Python, and FFmpeg. Our "AI Director," powered by Gemini 2.5 Flash, performs vision analysis to interpret crude sketches and arrows as camera movements. This information, combined with speech-to-text prompts from the Web Speech API, is processed by the Veo 3.1 engine to generate high-fidelity cinematic clips.

Challenges we ran into- One of the primary challenges we faced was the "nightmare" of Inconsistency, where characters would morph and backgrounds would shift between sequential clips. We also had to overcome the limitations of traditional planning, as static drawings simply do not capture the complexity of motion over time. Engineering a system that could "read" a crude sketch and "see" an arrow as a command for a specific camera move required complex integration between our vision models and the video generation engine.

Accomplishments that we're proud of- We are incredibly proud of achieving Controlled Spatial Composition, moving away from the random outputs of "God Prompts". Our team successfully implemented a seamless loop where the last frame of a new clip appears on the canvas automatically, enabling a continuous creative flow. By creating an Iterative Directing workflow, we have provided a tool that allows for exploratory, non-linear generation that was previously impossible with standard video AI tools.

What we learned- Through this project, we learned the importance of Context Awareness—how an AI must simultaneously "see" a sketch and "read" a prompt to truly understand human intent. We discovered that a multimodal approach, combining visual, voice, and text inputs, is far more effective for creative tasks than text-heavy interfaces. Most importantly, we learned how to leverage the "AI Director" to translate simple human gestures into complex cinematic instructions.

What's next for Krafity.ai- Moving forward, we plan to push beyond simple text-to-video by further refining our iterative directing capabilities. Our goal is to make Krafity.ai the industry standard for professional animators who need to storyboard their vision frame by frame with absolute precision. We are also looking into deeper integration of branching canvas features to allow for even more complex, multi-path narrative development.

Built With

Share this project:

Updates