🎮 ZeroGraft — Agentic Godot
A fork of Godot Engine with the power of AI
Inspiration
Game development has a brutal paradox: the tools are more powerful than ever, but the learning curve keeps rising. A solo developer with a great game idea still needs to master GDScript, understand physics engines, create pixel art, configure tilesets, and wire up signals — before a single playable frame exists. We asked ourselves: what if the engine itself could be your collaborator?
We were inspired by the agentic coding movement — tools like Cursor, Windsurf and Antigravity that let AI write code alongside you. But game development isn't just code. It's art, physics, scene graphs, and spatial reasoning. No existing tool bridges that gap.
So we forked Godot Engine and gave it a brain.
What It Does
ZeroGraft lets you describe your game in plain English. Gemini 3 builds it — generating pixel art, writing GDScript, configuring physics, and assembling playable scenes in real-time inside the editor.
Core Capabilities
87 native Gemini function declarations — scene creation, node manipulation, GDScript generation, physics setup, and AI art generation. No prompt hacking — pure structured API.
Multi-agent orchestrator — Architecture, Character, Level, and QA agents coordinate complex builds. 50+ step game creation with loop detection and error recovery.
SpriteMancer — A 25-tool AI art pipeline that generates pixel-perfect characters, animations (idle, walk, run, attack, jump), 6 tileset types, VFX effects, and parallax backgrounds. It uses an 8-stage biomechanical scripting pipeline that separates reasoning (animation logic) from rendering (pixels), eliminating temporal hallucination in AI-generated sprites.
Dual Character System — Two-character relational animations with temporal causality enforcement. The responder's timeline is mathematically derived from the instigator's impact windows.
Vision debugging — Attach screenshots. Gemini sees your viewport to diagnose positioning bugs and verify visual changes.
Live streaming — Watch AI reasoning unfold token-by-token in a custom 2,400-line C++ editor panel embedded in Godot's native UI.
The result: A single person can type "Build me a platformer with a knight who double-jumps" and get a playable game with AI-generated pixel art, GDScript, physics, and a complete scene tree — all inside the Godot editor.
How We Built It
The system has three major components, all powered by Gemini 3:
1. Agentic Godot (C++ Engine Fork)
We forked Godot 4.3 and wrote custom C++ modules:
- A GodotBridge TCP server that exposes the entire editor API
- An AIPanel (2,400 lines of C++) for real-time streaming chat with Gemini
- A SpriteMancerDock with an embedded Chromium browser for the art pipeline
- Patches to Godot's source that expose internal APIs the bridge needs
2. AI Router (TypeScript)
A TypeScript service that acts as the brain:
- Defines 87 function declarations for Gemini 3
- Routes requests through either a single-agent TaskExecutor (for simple queries) or a MultiAgentSystem with an orchestrator and 4 specialized agents
- Extended Thinking handles complex multi-step planning
- Includes loop detection, error recovery, and context summarization for long conversations
3. SpriteMancer (Python + Next.js)
An 8-stage AI art pipeline:
| Stage | What It Does |
|---|---|
| 1 | Character DNA Extraction — Gemini 3 Pro analyzes a reference image into a semantic schema (archetype, body type, colors, equipment, anatomical constraints) |
| 2 | DNA Verification — visual evidence cross-check |
| 3 | Action Definition + Auto-Scaled Frame Budget — physics-based: weapon mass × difficulty tier × perspective |
| 5 | Biomechanical Scripting — Gemini generates frame-by-frame pose descriptions with phase tags: Anticipation → Contact → Recovery |
| 6 | Image Generation — Gemini 3 Pro generates a 4K spritesheet grid |
| 7 | OpenCV Post-Processing — thresholding → morphological noise removal → watershed segmentation → row-major contour sorting |
| 8 | Single-Frame Repair Loop — mask-based regeneration (never regenerates the entire grid) |
The Dual Character pipeline extends this to 10 stages, adding interaction constraints (reach advantage, speed advantage, mass ratio) and temporally-bound responder scripts.
Tech Stack
Godot 4.3 (C++ fork) · Gemini 3 Pro/Flash · TypeScript/Bun · Python/FastAPI · Next.js/React · Supabase · Upstash Redis · Docker Compose + Caddy
Challenges We Ran Into
Bridging C++ and AI. Godot's editor is pure C++. Building a real-time streaming chat panel that handles SSE token streams, function call results, and image previews — all within Godot's UI framework — required 2,400 lines of careful C++ with custom HTTP client code.
Temporal hallucination in sprites. Early attempts at animation generation produced frames where characters teleported between poses. The solution was the biomechanical scripting stage — separating what to draw (physics reasoning) from drawing it (image generation). This eliminated the hallucination problem.
Multi-agent coordination. Complex game-building requests (e.g., "build a coin collector game") require 15+ sequential tool calls across multiple agents. Handling errors mid-sequence, detecting infinite loops, and maintaining context across agent handoffs was a significant engineering challenge.
Dual character temporal causality. Making two characters' animations physically coherent required enforcing that the responder's timeline is never independent — it's always derived from the instigator's impact windows, with strict phase mapping and frame count matching.
87 function declarations. Designing a structured API surface this large required careful schema design so Gemini could reliably choose the right tool and fill parameters correctly. Each function needed precise descriptions, required/optional params, and enum constraints.
Accomplishments We're Proud Of
A single person can type "Build me a platformer with a knight who double-jumps" and get a playable game with AI-generated pixel art, GDScript, physics, and a complete scene tree — all inside the Godot editor.
The SpriteMancer pipeline produces consistent character identity across animation frames at ~$0.24 per animation.
The dual character system enforces physical causality in AI-generated combat animations — something no existing tool does.
What We Learned
Structured function calling >> prompt engineering for complex tool use. Gemini 3's native function declarations made 87 tools reliable in a way that prompt-based approaches never could.
Separation of reasoning and rendering is the key insight for AI-generated animation. Let the model think about physics first, then draw.
Extended Thinking is transformative for multi-step planning. Watching the model break down "build a platformer" into 15 ordered steps — and then execute them all — was the moment we knew this approach works.
Vision is underrated. Screenshot-based debugging (Gemini seeing your game viewport) catches spatial bugs that text-only AI never could.
What's Next
ZeroGraft is a working MVP. Here's where we're headed:
Near-term — Polish the core
- Stability and error handling — Harden the 87-function API with retries, graceful degradation, and clear error messages when Gemini calls fail mid-sequence
- Onboarding and templates — Pre-built starter projects (platformer, top-down RPG, puzzle) so new users get results in under 60 seconds
- Export pipeline — One-click export of SpriteMancer assets as Godot-ready
.tresresources with auto-configured AnimationPlayer nodes
Mid-term — Expand capabilities
- Multi-responder combat animations — One instigator vs. multiple responders with staggered temporal binding
- Counter-attack chaining — Responder becomes instigator mid-sequence, enabling full combat choreography in a single generation pass
- Parallel sprite generation — Generate both characters' spritesheets simultaneously, cutting dual-character generation time in half
- 3D asset support — Extend the agentic pipeline beyond 2D pixel art to Godot's 3D workflow (meshes, materials, animations)
How We Used Gemini
Gemini 3 is the sole AI backbone of ZeroGraft. Every intelligent capability runs through Gemini's API:
Gemini 3 Pro — The Brain
| Capability | How Gemini Is Used |
|---|---|
| Native Function Calling | 87 structured function declarations — Gemini selects the right tool and fills parameters for scene creation, GDScript generation, physics config, and art pipeline orchestration. Zero prompt hacking. |
| Extended Thinking | Complex requests like "build a platformer with enemies and parallax" trigger Gemini's thinking mode for multi-step planning with configurable thinking budgets (1K–8K tokens). The model decomposes tasks into 15+ ordered steps before executing. |
| Live Streaming | Token-by-token SSE streaming displays Gemini's reasoning in real-time inside our custom C++ editor panel. Users watch the AI think and act. |
| Multimodal Vision | Users attach viewport screenshots. Gemini analyzes the game's visual state to diagnose positioning bugs, verify sprite placement, and validate scene composition. |
| Image Generation | Gemini 3 Pro generates 4K spritesheet grids for characters and tilesets — the raw pixel output of our 8-stage art pipeline. |
| Biomechanical Scripting | Gemini reasons about animation physics (anticipation → contact → recovery phases) and generates frame-by-frame pose descriptions before any pixels are drawn. This separation of reasoning and rendering eliminates temporal hallucination. |
| Character DNA Extraction | Gemini analyzes reference images into semantic schemas — archetype, body proportions, color palette, equipment, anatomical constraints — enabling consistent identity across all generated frames. |
| Multi-Agent Orchestration | Four specialized Gemini-powered agents (Architecture, Character, Level, QA) coordinate through a state-machine orchestrator. Each agent has its own system prompt and tool subset, all calling Gemini 3 Pro. |
Gemini 3 Flash — Fast Analysis
- DNA Verification — Cross-checks extracted character DNA against the original reference image
- Animation QA — Rapid visual validation of generated spritesheets before export
- Error Diagnostics — Quick analysis of failed generation attempts to guide retry logic
Why Gemini Was Essential
No other model offers native function calling + extended thinking + streaming + vision + image generation in a single API. ZeroGraft's 87-function surface, real-time streaming UI, screenshot debugging, and AI art pipeline all depend on Gemini capabilities that don't exist elsewhere as a unified offering.
Built With
- Gemini 3 Pro
- Gemini 3 Flash
- Godot Engine 4.3 (C++ fork)
- TypeScript / Bun
- Python / FastAPI
- Next.js / React / TailwindCSS
- Supabase
- Upstash Redis
- Docker Compose + Caddy
- OpenCV
Log in or sign up for Devpost to join the conversation.