MeshMerge

Inspiration

Version control works perfectly for code — but completely breaks for 3D scenes.

When a .blend file changes, Git just says:

binary file changed

It doesn’t tell you:

what changed
where it changed
or why the scene now looks different

In real production workflows (games, architecture, VFX), artists constantly tweak lighting, camera position, and geometry together. A building might be scaled up while the camera moves back, making the final render look identical — but Git treats it as an opaque binary change.

We realized the real problem wasn’t diffing files. It was understanding intent and perception in 3D scenes.

So we asked:

What if scene diffs worked like pull requests for 3D?

That idea became MeshMerge.

What it does

MeshMerge is an AI-powered scene diff engine that explains why a 3D scene changed — not just what changed.

Given two .blend files, MeshMerge:

Extracts structured scene data (objects, transforms, lights, camera)
Computes deterministic structural diffs
Performs a visual image diff
Detects ambiguous situations (camera vs geometry, lighting vs material)
Uses Gemini to reason about causal relationships
Generates:

a Git-style changelog
a semantic JSON report
a visual heatmap
a PDF review report

It can identify cases like:

Object scaled but camera moved → visual size unchanged
Lighting changed but materials same → perceptual shift
Multiple objects moved together → parent transform
Scene looks different but no geometry changed → camera/lighting cause

MeshMerge essentially adds a reasoning layer to 3D version control.

How we built it

MeshMerge is a modular multimodal pipeline built with Python and Blender automation.

1. Scene Extraction

We used Blender’s Python API (bpy) to export structured scene data:

object transforms
bounding boxes
mesh stats
lights
camera

We also render viewport images for visual comparison.

2. Deterministic Diff Engine

A structural diff compares:

transforms
scale
mesh stats
materials
object additions/removals

This produces a ground-truth change list.

3. Visual Diff

We run an image diff using NumPy + Pillow to detect changed regions and generate heatmaps.

This tells us where the scene looks different.

4. Vision Correlation

We project object bounds into screen space and correlate structural changes with visual regions to confirm which changes are perceptually visible.

5. Ambiguity Detection Layer

This is where things get interesting.

We detect cases where deterministic logic cannot decide the cause:

camera vs geometry
lighting vs material
perceptual-only changes
cascading transforms

These are passed as hypotheses.

6. Gemini Reasoning Engine

Gemini acts as a causal inference layer.

It receives:

structural diffs
visual regions
ambiguity hypotheses
camera depth metrics
scene JSON
before/after viewport images

It resolves contradictions and produces a structured semantic report explaining the scene changes.

7. Output Generation

Finally we generate:

CHANGELOG.md (human-readable)
semantic_scene_report.json
annotated images
PDF report

Challenges we ran into

1. 3D → 2D projection problem Mapping object bounds to screen space without a full renderer was tricky. We built a heuristic projection system to approximate visual overlap.

2. Camera vs geometry ambiguity A scaled object and a moved camera can cancel each other visually. Detecting this required computing camera distance deltas and reasoning about apparent size.

3. Multimodal reasoning We didn’t want Gemini to just summarize data — it needed to resolve contradictions. Designing the prompt + schema to force causal reasoning took several iterations.

4. Blender automation reliability Running Blender headless across environments required careful scripting and path handling.

5. Report generation Keeping long text contained in PDFs and ensuring visuals aligned correctly required custom layout handling.

Accomplishments that we're proud of

Built a full end-to-end pipeline from .blend → AI changelog
Created a system that explains perceptual vs structural changes
Successfully detected camera vs scale compensation scenarios
Generated human-readable scene diffs automatically
Produced visual + semantic reports for review workflows
Designed a clean, reproducible pipeline for judges to run locally

Most importantly:

We turned a binary file diff into an explainable scene narrative.

What we learned

Multimodal reasoning works best when deterministic systems feed structured context
LLMs are powerful when resolving ambiguity, not just summarizing
3D version control is a real unsolved problem in creative pipelines
Clear schema design dramatically improves AI reliability
Visual + structural data together unlock much richer insights

We also learned how to design AI systems that:

admit uncertainty
explain decisions
and justify conclusions

What's next for MeshMerge

Short term

Depth-accurate projection instead of heuristic projection
Parent-child scene graph detection
More robust ambiguity classification
Side-by-side diff UI

Mid term

Blender plugin for real-time scene diff
GitHub integration for .blend pull requests
Timeline diff for animation sequences
Scene merge conflict detection

Long term vision

MeshMerge becomes:

“GitHub for 3D scenes”

Where artists can review scene changes like code diffs:

visual
semantic
explainable

Final Thought

MeshMerge isn’t just a diff tool.

It’s a scene reasoning engine.

It answers the question:

Why does this scene look different?

And that’s something traditional version control has never been able to do.

Built With

blender
gemini
pillow
python
reportlabs

Updates

PRAVEEN B 27CSB started this project — Feb 09, 2026 03:31 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.