Inspiration

As an AI video model product manager, after speaking with hundreds of professional creators, I realized they share a common set of pain points. For AI video production, they demand controllability, creative freedom, and automation. They gravitate toward node-based workflows (like Flora or ComfyUI) because every step is transparent and manageable. They also crave an infinite canvas for boundless creative exploration. Most importantly, they need automation for batch production. Currently, products on the market might satisfy two of these needs, but none can bridge all three—until now.

What it does

OURO is the first node-space framework for AI video production. It is designed to solve these creator pain points from the ground up.

  • Infinite Canvas: The left-side workspace allows users to explore and build controllable workflows freely.
  • OURO Agent: Powered by Function Calling (FC) and context injection, the Agent has full agency over the canvas. Think of OURO as a collaborative teammate—it works alongside you on the canvas, helping you generate images and videos, and ultimately handling the final assembly and editing.

How we built it

  1. Ideation: I mapped out the core logic and used Stitch to build the initial design system. I collaborated with Gemini 3 Pro to architect the tech stack and system framework.
  2. MVP: I built a minimum viable product starting with the core context and canvas. Using Next.js and React-flow, I validated the canvas functionality. For the Agent, I leveraged the Vertex AI API to test the feasibility of Gemini 3 Pro’s multi-modal capabilities and FC.
  3. Full Development: Development proceeded in parallel for frontend and backend. I established modular design tokens first, then iterated outward from the Agent-Canvas core.
  4. Optimization: Continuous debugging and performance tuning to ensure a fluid user experience.

Challenges we ran into

Building such a massive project solo was an exhausting but rewarding marathon. Here are the toughest hurdles:

  1. Malformed Function Calls: Mid-development, Gemini would occasionally generate malformed FCs, causing immediate crashes. After two days of deep-diving, I discovered a workaround: by restricting the model from outputting arrays within the FC parameters, I successfully bypassed this known stability issue.
  2. Gemini Empty Responses: When context exceeded 15k tokens, the model would sometimes return empty messages, especially after a tool response. I solved this through a three-pronged approach:
  3. Ensuring the "thought signature" (Chain of Thought) was correctly passed.
  4. Developing a Context Summarization mechanism where users can set a threshold for optimization.
  5. Correcting a role-assignment bug where tool outputs were being sent under the wrong role.

  6. Context Engineering: Unlike code editors like Cursor, OURO’s context is heavily multi-modal. I implemented Context Injection to send "canvas diffs" with every user message. I also equipped the Agent with specific tools to "read" the state of nodes and canvas layouts, ensuring it always stays in sync with the user's edits.

Accomplishments that we're proud of

  1. Independently built a visually stunning and highly functional AI Video Agent within 35 days.
  2. Beta-tested with 11 professional creators; 8 have already expressed a strong willingness to pay for the final product.
  3. Successfully implemented a complex, multi-modal context engineering pipeline.
  4. Produced several high-quality AI short films using the OURO platform itself.

What we learned

  1. Deepened my mastery of the Google Gen AI SDK and pushed the operational boundaries of Gemini 3 Pro.
  2. Gained a fundamental understanding of React-flow principles and state management in node-based UIs.
  3. Learned the nuances of building a custom context engine from scratch.

What's next for OURO

  1. Integration: Expand the ecosystem by integrating more AI video/image APIs and creative toolsets.
  2. Skills Expansion: Enhance the Agent’s "Skill Tree" to handle more diverse production scenarios.
  3. Advanced Editing: Implement a non-linear video editor (NLE) within the canvas, allowing OURO to perform high-end post-production.
  4. Cold Start: Launch on social media to build an early adopter community and iterate based on user feedback.

Built With

  • fast-api
  • gemini-3-pro
  • gemini-3-pro-image-preview
  • next.js
  • postgresql
  • python
  • react-flow
  • sora-2
  • veo-3.1
  • websocket
Share this project:

Updates