Project Story

About the project

Lumina Shader Graph started from a very specific idea: bring a “Unity Shader Graph–like” experience to the browser, but enhanced with multimodal model assistance. Since shaders are complex, highly visual GPU computations, we needed a reasoning-capable model that can also handle visual context—something that was key for the hackathon.

The inspiration came from two angles:

  • Shaders are the soul of modern graphics, but building them has always been a technical nightmare—both for beginners and for experienced game developers. A node-based app enables incremental visualization and helps a lot, but it still has limits when you don’t know the underlying technical details required to achieve a specific effect.
  • The opportunity to use Gemini 3, which is multimodal and a frontier model, as a true copilot—not only to explain or reason, but to create and edit (and even generate textures on demand) inside the same environment.

What I built (and how)

The project is split into two services designed to work together:

  • Frontend (Vite + React + TypeScript): a visual node editor in the browser with a canvas and live preview. From here we send the user prompt, chat history, and a graph snapshot to the backend.
  • Backend (FastAPI + Google ADK + Gemini 3): an agent that acts as an intent “router” and tool executor. Based on the user message, it classifies the request into modes like ARCHITECT (create), EDITOR (surgical edits), REFINER (diagnose/fix), or CONSULTANT (explain). It then returns deterministic JSON operations (e.g. add_node, remove_node, add_connection, update_node_data) so the frontend can apply changes unambiguously.

To improve efficiency, the backend normalizes the graph context into compact representations (instead of sending the full raw JSON), which reduces cost/tokens.

Multimodal assets (images)

Another important piece was the multimodal workflow: the backend supports image inputs. Assets (including textures embedded as data:) are persisted in an Asset Store and referenced via assetId, avoiding sending base64 back to the model. When images need to be included in model requests, we resize them (max 768px on the longest side) to keep prompts lightweight while preserving the original resolution for rendering.

What I learned

  • How to design an agentic assistant that doesn’t “make up” changes, but instead operates through verifiable actions (JSON operations) over a structure.
  • The importance of separating the problem into intent → tools → operations, especially when you want reproducible edits.
  • That context management (graph summarization/normalization) is critical for a model to perform well on long, technical tasks.
  • How to optimize token ingestion: reduce and structure context (normalization/compaction), avoid heavy payloads (base64) by using assetId, and send only the necessary signal to maintain quality without inflating cost/latency.
  • Best practices for multimodal workflows: persist, reference by ID, and control input sizes.

Challenges I faced

  • Translating natural language into structured changes: the hard part isn’t only “understanding” the prompt, but converting it into a minimal sequence of correct operations.
  • Maintaining determinism: ensuring the backend always returns applicable operations, ordered correctly, and without side effects.
  • Context vs. cost: balancing how much graph information to send so the model has enough signal, without blowing up tokens.
  • Nodes that break or don’t connect when user context is too low: in node graphs (and especially shaders) types, slots, and dependencies matter a lot. The more concrete the prompt is—and the more “shader graph language” the user provides (expected inputs/outputs, color space, UVs, normals, etc.)—the more likely the app is to produce one-shot results without manual fixes.
  • Assets and images: avoiding base64 in prompts and handling image size without breaking the preview flow.

Why it matters

Lumina Shader Graph demonstrates a useful pattern for creative tools: using generative models not only as “chat”, but as structured editors that produce reproducible changes. This enables rapid iteration, safer refactors, and a real bridge between human intent and real-time graph editing. It also helps democratize a complex area like computer graphics, letting developers focus on their game and ask Lumina for effects—reflections, shadows, transparency, emission, noise, and more—on demand.

Built With

Share this project:

Updates