SiteSnap: The Self-Correcting Design Agent

💡 Inspiration

The inspiration for SiteSnap came from a specific pain point in the frontend workflow: the "Uncanny Valley" of AI code generation. We noticed that while LLMs are excellent at writing logic, they often fail at spatial reasoning. You upload a mockup, and the AI gives you code that functionally works but visually drifts—wrong padding, missing shadows, or mismatched fonts.

We wanted to build a tool that didn't just write code, but actually "looked" at it. We asked: Can we build an agent that runs its own code, sees the result, and fixes its own CSS mistakes?

⚙️ How We Built It

SiteSnap is a fully client-side React application that acts as an autonomous agent loop.

1. The Core Stack

  • Frontend: React, Tailwind CSS, and Lucide React.
  • AI Engine: We utilize gemini-3-pro-preview for its superior multimodal capabilities and large context window.
  • Runtime: A custom-built in-browser bundler using @babel/standalone.

2. The "Vibe Check" Loop

The heart of SiteSnap is the iterative refinement process we call the Vibe Check. Instead of a linear "Prompt $\rightarrow$ Code" flow, we implemented a feedback loop:

  1. Generation: Gemini analyzes the uploaded mockup and generates a multi-file React project structure (splitting App.tsx, components/, etc.).
  2. Live Rendering: We don't just display code; we compile it instantly in the browser. We built a virtual file system and a custom require() shim that allows the generated React code to run inside a sandboxed iframe without a backend server.
  3. Visual Diffing: The user can trigger a "Vibe Check." The agent re-analyzes the rendered output against the original design. We treat this as an optimization problem where we try to minimize the visual discrepancy function L:

The agent then self-corrects, outputting only the specific file patches needed to reduce L.

3. Bespoke Asset Generation

We noticed that placeholder images ruin the immersion of a generated prototype. We implemented a sub-agent using Gemini 2.5 Flash (gemini-2.5-flash-image) that parses the generated code, identifies <img src="placeholder"> tags, understands the context (e.g., "a modern tech office"), and generates bespoke assets on the fly to replace the placeholders.

🚧 Challenges We Faced

The "Import Hallucination" Problem

One of the hardest technical hurdles was ensuring the generated code didn't crash the runtime. LLMs frequently use icons (like <Rocket />) without importing them.

  • Solution: We built an Auto-Healing Compiler. Before rendering, our engine statically analyzes the AST (Abstract Syntax Tree) of the generated code via Regex scanning. If it detects a usage of an icon that isn't imported, it automatically injects the correct import { Icon } from 'lucide-react' statement.

The "Route" Collision

We discovered a persistent edge case where the model would confuse the Route component from react-router-dom with the Route icon from lucide-react, causing the app to crash.

  • Solution: We had to implement strict system prompting to force the model to alias the icon (e.g., import { Route as RouteIcon }) to prevent namespace collisions.

Browser-Based Bundling

Running a multi-file React app with routing entirely in the browser is difficult. We had to write a custom module resolver that mimics Node.js's resolution logic (handling ./, ../, and index.tsx lookups) so that the AI could create complex directory structures that actually work in the preview window.

🧠 What We Learned

  • Code is fragile; Vibe is resilient: We learned that "perfect code" is less important than "resilient code." By building the Auto-Healer, we made the agent significantly more robust to minor AI errors.
  • Multimodality is a debugger: We found that giving the model the ability to "see" the rendered output (by feeding screenshots back into the context) improved CSS accuracy by an order of magnitude compared to text-only feedback.
Share this project:

Updates