Gatewai: The Agentic Visual Canvas for Generative AI
Note: The demo video submitted for this hackathon was edited and rendered using Gatewai itself.
Inspiration & The Problem
Current Generative AI tools are polarized:
- Simple Prompt Boxes: Easy to use, but lack control and composability.
- Node-Based Tools (e.g., ComfyUI, FalAI Workflows): Powerful, but suffer from steep learning curves and "spaghetti graph" complexity.
Gatewai bridges this gap. It is a next-generation "Vibeflow" engine that combines the intuition of a canvas with the power of a node graph. By leveraging Gemini 3's reasoning capabilities, Gatewai acts as a "Pair Programmer" for creativity, intelligently constructing complex multi-modal pipelines so users don't have to wire them manually.
What is Gatewai?
Gatewai is a next-generation node-based generative AI workflow engine. It empowers users to design sophisticated multi-modal workflows—combining Image, Video, and Text—through an intelligent visual canvas that works with you, not just for you.
Key Innovations
- 🤖 Gatewai Agent (The "Pair Programmer"): Instead of manually connecting dozens of nodes, just tell the Agent: "Create a podcast generator." It intelligently constructs the graph using Gemini 3's reasoning, acting as a functional thought partner.
- ⚡ Hybrid Execution Engine: Lightweight logic and UI interactions preview instantly in the browser (Client-Side), while heavy generative tasks are offloaded to the server (Server-Side).
- 🎨 Pixel-Perfect Consistency: We achieved a "Unified Rendering Engine." Whether previewing a blur in the browser (WebGL) or processing 4K video on the server (Headless GL), the output is mathematically identical.
- 🔌 Headless by Design: Workflows built in Gatewai can be executed via API, allowing developers to embed these pipelines into third-party apps without loading the UI.
Powered by Gemini 3
Gatewai pushes the boundaries of the Gemini ecosystem:
| Feature | Model / Tool Used | Application in Gatewai |
|---|---|---|
| Logic & Code | Gemini 3 Pro | Powering the "Agent" to reason through complex graph schemas and generate valid node connections. |
| Video | Veo 3.1 & Veo Fast | Generating high-fidelity video assets and handling first/last frame transitions. |
| Image | Nano Banana / Pro | Text-to-Image generation and "In-fill/Out-fill" image editing nodes. |
| Audio | Gemini 2.5 TTS & Flash | Text-to-Speech generation and Audio Understanding for multi-modal context. |
| Cognition | Thought Signatures | Used within the Agent to validate graph logic before execution. |
| Antigravity | Gemini 3 Pro / Flash | Used extensively for the development of the project. |
How I Built It (The Stack)
Gatewai is built for speed, type safety, robustness and ease of use.
Architecture
- Frontend: React, Vite, TailwindCSS.
- Canvas Frontend: React Flow + PixiJS (Shared rendering logic).
- Image Editor: Konva (Shared rendering logic).
- Video Player / Editor: Remotion
- Backend: Node.js (Hono + RPC) for low-latency API handling. Postgres for storage. Google Cloud Storage for user assets. Google Cloud Engine for VM deployment.
- Orchestration: BullMQ & Redis for managing asynchronous, long-running AI generation tasks.
- Agent Sandbox: quickjs-emscripten (WASM) for safe execution of AI-generated code.
Challenges & Learnings
The "Coding Agent" Patcher
One of the technical challenge was making the AI Agent respect the strict 20+ node schema. Direct JSON generation was too error-prone and atomic updates on canvas was not the best tool for big workflows. For these reasons, I built a Multi-Agent System:
- The Orchestrator: Plans the high-level workflow structure and offloads technical manupilations to the Patcher Agent.
- The Patcher (Coding Agent): This is a specialized Coding Agent that translates the plan into executable JavaScript code patches.
- The Sandbox: These code patches are run inside a secure WASM container by the agent, validating the graph changes before applying them to the user's canvas. This "Code-as-Action" approach makes the Agennt robust and crash-proof.
Securing the Sandbox
I initially used vm2 but identified context leakage risks. Migrating to quickjs-emscripten allowed me to run the Coding Agent's output in a totally isolated WebAssembly container, ensuring the platform remains secure even if the AI generates experimental code.
Unified Rendering (Browser vs. Server)
Replicating CSS filters on a backend server is bad practice and difficult for this case. I solved this by implementing Dependency Injection for the graphics pipeline. The same PixiJS code runs on WebGL (Browser) and headless-gl (Server), ensuring the preview exactly matches the final export.
Accomplishments
- Graph-as-Code: Successfully proving that multi-modal workflows are essentially "visual code," where variables (like Character Style) can be hot-swapped to regenerate entire narratives instantly.
- Full Video Compositing: Unlike most tools that stop at images, Gatewai handles timeline-based video layering.
- Demo-on-Platform: The demo video you are watching was actually edited and rendered using Gatewai.
What's Next
- Workflow-as-Code Runtime: I believe the best way a LLM can think abstractly is coding, similar to how a speaking language and math improves human thinking capabilities. For that reason a new custom syntax allowing agents to execute workflows directly on a WASM interpreter would improve the capabilities of Workflow Agent.
- Multiplayer: Real-time collaborative graph editing (Google Docs style).
- New Nodes: New nodes for media editing including SVG support, Iterator Nodes, PDF canvas editor, file parsers etc.
- Marketplace: A hub for sharing custom nodes and "vibeflows."
- ** Google Genie:** I would love to see what users can do with Google Genie in Gatewai.
Built With
- better-auth
- biome
- bullmq
- docker
- ffmpeg
- gce
- gcs
- gemini
- hono
- javascript
- konva
- mcp
- nginx
- node.js
- pixi.js
- pnpm
- postgresql
- prisma
- react
- reactflow
- redis
- redux
- remotion
- shadcn
- sharp
- tailwind
- turbo
- typescript
- vite
- webgl
Log in or sign up for Devpost to join the conversation.