Inspiration

We were frustrated by how fragmented and rigid AI content creation tools are. Every tool does one thing and none of them talk to each other. Worse, when a new AI capability drops, you're stuck waiting for the platform to integrate it. We wanted to build something fundamentally different: an open, extensible canvas where anyone can plug in new AI capabilities as nodes, connect them together, and build workflows that no single tool could offer on its own.

What it does

Koda is a visual workflow editor for AI-powered content creation, built around an open plugin architecture. Instead of shipping a fixed set of features, Koda provides the canvas and the connection layer; plugins bring the capabilities.

Out of the box, Koda ships with plugins for:

  • Image generation using Nano Banana Pro (Gemini 3 Pro Image) with preset composition, combine character, style, camera angle, and lens presets visually
  • Video generation with first/last frame control and multi-reference inputs, utilizing Veo3.1 and Veo3.1 Fast models.
  • Music and speech with ACE-Step and ElevenLabs TTS
  • Storyboarding powered by Gemini 2.5 Pro with extended thinking through an iterative AI agent that reasons, drafts scene breakdowns, and spawns image nodes for each scene, all via streaming chat with visible thinking
  • Animation generation powered by Gemini 2.5 Flash for video analysis and media understanding. A conversational agent that writes Theatre.js (3D motion graphics) or Remotion (React-based 2D) code, executes it in a sandboxed container via E2B, renders live previews, and iterates based on feedback -Product Photography using Gemini 2.5 Pro with thinking to analyze products and generate professional product shot compositions.

But the real power is in how plugins compose. A text node feeds a prompt into a storyboard agent, which spawns image nodes, whose outputs flow into video nodes, entire production pipelines emerge from simple connections. And because the plugin system is open, new capabilities can be added without touching Koda's core. Three plugin tiers make this accessible to everyone:

  • Simple plugins (no-code): define an input, an AI prompt template, and an output. Anyone can create one
  • Transform plugins: developer-built nodes that process or convert data between other nodes
  • Agent plugins: full conversational AI agents with Canvas API access, sandbox execution, and the ability to programmatically create entire node graphs

Plugins don't just live inside their nodes, they can reach out and build workflows on the canvas itself.

How we built it

  • Next.js 16 (App Router) as the framework with server-side streaming API routes
  • React Flow (@xyflow/react) for the infinite canvas with 18 custom node types and custom edges
  • Zustand for state management with 50-level undo/redo, clipboard operations, and localStorage persistence
  • Fal.ai as the unified backend for image, video, music, and audio generation across 20+ models
  • Mastra + Google Gemini for AI agent orchestration:
    • Gemini 2.5 Pro powers the storyboard agent with extended thinking (10,000 token thinking budget) for structured scene generation, and drives the product shot analysis with reasoning capabilities
    • Gemini 2.5 Flash handles video and media analysis in the animation pipeline, leveraging its native video understanding for scene breakdown, motion detection, and audio cue extraction
  • A Canvas API exposed to plugins, letting agent plugins create nodes, draw edges, and orchestrate entire workflows programmatically
  • Server-Sent Events (SSE) for real-time streaming of agent reasoning, generation progress, and thinking blocks
  • E2B sandboxes for secure execution of AI-generated animation code with live preview rendering via Puppeteer and FFmpeg
  • Tailwind CSS 4 + shadcn/ui for a polished dark-theme UI
  • Multi-backend storage supporting localStorage (demo), SQLite (self-hosted), and Turso + R2/S3 (cloud)

Challenges we ran into

  • Designing a plugin API that's both powerful and safe, agent plugins need deep canvas access (create nodes, draw edges, read state) but can't be allowed to corrupt the workflow. Getting the right abstraction boundary took several iterations
  • Streaming agent UX, showing AI reasoning in real-time while maintaining a clean timeline UI was hard. We built a custom SSE protocol with text deltas, tool calls, reasoning-delta events, and thinking blocks that render progressively
  • Sandbox orchestration running plugin-generated animation code safely required building a full container lifecycle (create, write files, install deps, execute, screenshot, render, destroy) with proper error recovery
  • Node layout optimization when a plugin agent generates 8+ scene nodes, placing them on the canvas without overlap while maintaining visual flow required a custom grid-based layout algorithm
  • State persistence keeping the canvas snappy with instant Zustand updates while reliably persisting to localStorage (and later SQLite/Turso) without hitting quota limits or race conditions

Accomplishments that we're proud of

  • The plugin architecture itself: three tiers (no-code, transform, agent) with a Canvas API that lets plugins create entire node graphs. It turns Koda from a tool into a platform
  • 18 node types working seamlessly together on one canvas: from simple text inputs to full agent plugins with sandboxed code execution
  • The storyboard plugin that thinks, drafts, refines, and then materializes an entire visual storyboard as connected nodes on the canvas: it demonstrates what agent plugins can do when given canvas access
  • The animation plugin: going from a text prompt to a fully rendered animation video through an AI agent that plans, writes code, executes it in a sandbox, and iterates based on feedback, all within a single node
  • Preset composition combining character + style + camera + lens presets visually to build sophisticated prompts that would take paragraphs to write manually
  • Local-first architecture that works offline with localStorage and scales to cloud with zero code changes

What we learned

  • The best platform is the one that gets out of the way by making plugins first-class citizens with real canvas access, we stopped being a bottleneck for new capabilities. The plugin system became more important than any individual feature we shipped
  • Agents need transparency: users trust AI more when they can see it thinking. Streaming reasoning and todo progress transformed the UX from "waiting for a black box" to "collaborating with an assistant"
  • Node-based UIs are deceptively complex; handle positions, edge routing, viewport management, multi-selection, copy/paste with offset, undo/redo across connected graphs, each interaction has edge cases
  • Model adapters are essential: abstracting away model-specific API differences behind a unified interface let us add new models in minutes instead of hours
  • Sandbox security is non-negotiable AI-generated code from plugins must run in isolated containers; the overhead is worth the safety

What's next for Koda

  • Plugin marketplace: a community hub where creators publish and share plugins, from brand kits to social media pipelines to product photography workflows. This is the big unlock
  • Plugin SDK and docs: making it trivial for developers to build and distribute new node types
  • Real-time collaboration: multiple users editing the same canvas simultaneously
  • User authentication with Supabase and persistent cloud workspaces
  • Template gallery: shareable workflow templates built from plugin compositions that others can fork and customize
  • AI Assistant node: a general-purpose creative brainstorming agent embedded directly on the canvas
  • More community-driven integrations: as the plugin ecosystem grows, Koda's capabilities grow with it

Built With

Share this project:

Updates