Inspiration

Version control works perfectly for code — but completely breaks for 3D scenes.

When a .blend file changes, Git just says:

binary file changed

It doesn’t tell you:

  • what changed
  • where it changed
  • or why the scene now looks different

In real production workflows (games, architecture, VFX), artists constantly tweak lighting, camera position, and geometry together. A building might be scaled up while the camera moves back, making the final render look identical — but Git treats it as an opaque binary change.

We realized the real problem wasn’t diffing files. It was understanding intent and perception in 3D scenes.

So we asked:

What if scene diffs worked like pull requests for 3D?

That idea became MeshMerge.


What it does

MeshMerge is an AI-powered scene diff engine that explains why a 3D scene changed — not just what changed.

Given two .blend files, MeshMerge:

  1. Extracts structured scene data (objects, transforms, lights, camera)
  2. Computes deterministic structural diffs
  3. Performs a visual image diff
  4. Detects ambiguous situations (camera vs geometry, lighting vs material)
  5. Uses Gemini to reason about causal relationships
  6. Generates:
  • a Git-style changelog
  • a semantic JSON report
  • a visual heatmap
  • a PDF review report

It can identify cases like:

  • Object scaled but camera moved → visual size unchanged
  • Lighting changed but materials same → perceptual shift
  • Multiple objects moved together → parent transform
  • Scene looks different but no geometry changed → camera/lighting cause

MeshMerge essentially adds a reasoning layer to 3D version control.


How we built it

MeshMerge is a modular multimodal pipeline built with Python and Blender automation.

1. Scene Extraction

We used Blender’s Python API (bpy) to export structured scene data:

  • object transforms
  • bounding boxes
  • mesh stats
  • lights
  • camera

We also render viewport images for visual comparison.

2. Deterministic Diff Engine

A structural diff compares:

  • transforms
  • scale
  • mesh stats
  • materials
  • object additions/removals

This produces a ground-truth change list.

3. Visual Diff

We run an image diff using NumPy + Pillow to detect changed regions and generate heatmaps.

This tells us where the scene looks different.

4. Vision Correlation

We project object bounds into screen space and correlate structural changes with visual regions to confirm which changes are perceptually visible.

5. Ambiguity Detection Layer

This is where things get interesting.

We detect cases where deterministic logic cannot decide the cause:

  • camera vs geometry
  • lighting vs material
  • perceptual-only changes
  • cascading transforms

These are passed as hypotheses.

6. Gemini Reasoning Engine

Gemini acts as a causal inference layer.

It receives:

  • structural diffs
  • visual regions
  • ambiguity hypotheses
  • camera depth metrics
  • scene JSON
  • before/after viewport images

It resolves contradictions and produces a structured semantic report explaining the scene changes.

7. Output Generation

Finally we generate:

  • CHANGELOG.md (human-readable)
  • semantic_scene_report.json
  • annotated images
  • PDF report

Challenges we ran into

1. 3D → 2D projection problem Mapping object bounds to screen space without a full renderer was tricky. We built a heuristic projection system to approximate visual overlap.

2. Camera vs geometry ambiguity A scaled object and a moved camera can cancel each other visually. Detecting this required computing camera distance deltas and reasoning about apparent size.

3. Multimodal reasoning We didn’t want Gemini to just summarize data — it needed to resolve contradictions. Designing the prompt + schema to force causal reasoning took several iterations.

4. Blender automation reliability Running Blender headless across environments required careful scripting and path handling.

5. Report generation Keeping long text contained in PDFs and ensuring visuals aligned correctly required custom layout handling.


Accomplishments that we're proud of

  • Built a full end-to-end pipeline from .blend → AI changelog
  • Created a system that explains perceptual vs structural changes
  • Successfully detected camera vs scale compensation scenarios
  • Generated human-readable scene diffs automatically
  • Produced visual + semantic reports for review workflows
  • Designed a clean, reproducible pipeline for judges to run locally

Most importantly:

We turned a binary file diff into an explainable scene narrative.


What we learned

  • Multimodal reasoning works best when deterministic systems feed structured context
  • LLMs are powerful when resolving ambiguity, not just summarizing
  • 3D version control is a real unsolved problem in creative pipelines
  • Clear schema design dramatically improves AI reliability
  • Visual + structural data together unlock much richer insights

We also learned how to design AI systems that:

  • admit uncertainty
  • explain decisions
  • and justify conclusions

What's next for MeshMerge

Short term

  • Depth-accurate projection instead of heuristic projection
  • Parent-child scene graph detection
  • More robust ambiguity classification
  • Side-by-side diff UI

Mid term

  • Blender plugin for real-time scene diff
  • GitHub integration for .blend pull requests
  • Timeline diff for animation sequences
  • Scene merge conflict detection

Long term vision

MeshMerge becomes:

“GitHub for 3D scenes”

Where artists can review scene changes like code diffs:

  • visual
  • semantic
  • explainable

Final Thought

MeshMerge isn’t just a diff tool.

It’s a scene reasoning engine.

It answers the question:

Why does this scene look different?

And that’s something traditional version control has never been able to do.

Built With

Share this project:

Updates