Gatewai: The Agentic Visual Canvas for Generative AI

Note: The demo video submitted for this hackathon was edited and rendered using Gatewai itself.

Inspiration & The Problem

Current Generative AI tools are polarized:

  1. Simple Prompt Boxes: Easy to use, but lack control and composability.
  2. Node-Based Tools (e.g., ComfyUI, FalAI Workflows): Powerful, but suffer from steep learning curves and "spaghetti graph" complexity.

Gatewai bridges this gap. It is a next-generation "Vibeflow" engine that combines the intuition of a canvas with the power of a node graph. By leveraging Gemini 3's reasoning capabilities, Gatewai acts as a "Pair Programmer" for creativity, intelligently constructing complex multi-modal pipelines so users don't have to wire them manually.

What is Gatewai?

Gatewai is a next-generation node-based generative AI workflow engine. It empowers users to design sophisticated multi-modal workflows—combining Image, Video, and Text—through an intelligent visual canvas that works with you, not just for you.

Key Innovations

  • 🤖 Gatewai Agent (The "Pair Programmer"): Instead of manually connecting dozens of nodes, just tell the Agent: "Create a podcast generator." It intelligently constructs the graph using Gemini 3's reasoning, acting as a functional thought partner.
  • ⚡ Hybrid Execution Engine: Lightweight logic and UI interactions preview instantly in the browser (Client-Side), while heavy generative tasks are offloaded to the server (Server-Side).
  • 🎨 Pixel-Perfect Consistency: We achieved a "Unified Rendering Engine." Whether previewing a blur in the browser (WebGL) or processing 4K video on the server (Headless GL), the output is mathematically identical.
  • 🔌 Headless by Design: Workflows built in Gatewai can be executed via API, allowing developers to embed these pipelines into third-party apps without loading the UI.

Powered by Gemini 3

Gatewai pushes the boundaries of the Gemini ecosystem:

Feature Model / Tool Used Application in Gatewai
Logic & Code Gemini 3 Pro Powering the "Agent" to reason through complex graph schemas and generate valid node connections.
Video Veo 3.1 & Veo Fast Generating high-fidelity video assets and handling first/last frame transitions.
Image Nano Banana / Pro Text-to-Image generation and "In-fill/Out-fill" image editing nodes.
Audio Gemini 2.5 TTS & Flash Text-to-Speech generation and Audio Understanding for multi-modal context.
Cognition Thought Signatures Used within the Agent to validate graph logic before execution.
Antigravity Gemini 3 Pro / Flash Used extensively for the development of the project.

How I Built It (The Stack)

Gatewai is built for speed, type safety, robustness and ease of use.

Architecture

  • Frontend: React, Vite, TailwindCSS.
  • Canvas Frontend: React Flow + PixiJS (Shared rendering logic).
  • Image Editor: Konva (Shared rendering logic).
  • Video Player / Editor: Remotion
  • Backend: Node.js (Hono + RPC) for low-latency API handling. Postgres for storage. Google Cloud Storage for user assets. Google Cloud Engine for VM deployment.
  • Orchestration: BullMQ & Redis for managing asynchronous, long-running AI generation tasks.
  • Agent Sandbox: quickjs-emscripten (WASM) for safe execution of AI-generated code.

Challenges & Learnings

The "Coding Agent" Patcher

One of the technical challenge was making the AI Agent respect the strict 20+ node schema. Direct JSON generation was too error-prone and atomic updates on canvas was not the best tool for big workflows. For these reasons, I built a Multi-Agent System:

  1. The Orchestrator: Plans the high-level workflow structure and offloads technical manupilations to the Patcher Agent.
  2. The Patcher (Coding Agent): This is a specialized Coding Agent that translates the plan into executable JavaScript code patches.
  3. The Sandbox: These code patches are run inside a secure WASM container by the agent, validating the graph changes before applying them to the user's canvas. This "Code-as-Action" approach makes the Agennt robust and crash-proof.

Securing the Sandbox

I initially used vm2 but identified context leakage risks. Migrating to quickjs-emscripten allowed me to run the Coding Agent's output in a totally isolated WebAssembly container, ensuring the platform remains secure even if the AI generates experimental code.

Unified Rendering (Browser vs. Server)

Replicating CSS filters on a backend server is bad practice and difficult for this case. I solved this by implementing Dependency Injection for the graphics pipeline. The same PixiJS code runs on WebGL (Browser) and headless-gl (Server), ensuring the preview exactly matches the final export.


Accomplishments

  • Graph-as-Code: Successfully proving that multi-modal workflows are essentially "visual code," where variables (like Character Style) can be hot-swapped to regenerate entire narratives instantly.
  • Full Video Compositing: Unlike most tools that stop at images, Gatewai handles timeline-based video layering.
  • Demo-on-Platform: The demo video you are watching was actually edited and rendered using Gatewai.

What's Next

  • Workflow-as-Code Runtime: I believe the best way a LLM can think abstractly is coding, similar to how a speaking language and math improves human thinking capabilities. For that reason a new custom syntax allowing agents to execute workflows directly on a WASM interpreter would improve the capabilities of Workflow Agent.
  • Multiplayer: Real-time collaborative graph editing (Google Docs style).
  • New Nodes: New nodes for media editing including SVG support, Iterator Nodes, PDF canvas editor, file parsers etc.
  • Marketplace: A hub for sharing custom nodes and "vibeflows."
  • ** Google Genie:** I would love to see what users can do with Google Genie in Gatewai.

Built With

Share this project:

Updates