Inspiration
Content creation for vr, game dev, and architectural visualization is normally very time consuming and requiring skilled 3D artists to meticulously model environments from scratch. I looked at a standard 2D photograph and thought: What if we could step inside it? We wanted to democratize 3D asset creation by building a bridge between state-of-the-art 2D perception models and 3D spatial rendering. My goal was to create a magic button that turns a flat memory into an immersive, fully editable 3D reality.
What it does
ImageUnfold is an end-to-end automated pipeline that transforms a single 2D photograph into a complete, editable 3D scene in Blender. Instead of just generating a flat 3D photo, our system actually understands the room. It identifies individual objects (like sofas, tables, and plants), extracts them, generates unique textured 3D meshes for each item (WIP), calculates their exact spatial positions using depth maps, and automatically reconstructs the entire room which are complete with floors, walls, and proper camera placement, directly inside Blender.
How I built it
We built a four-phase pipeline using multiple AI models:
Perception: InstructBLIP, YOLO-World, and Depth-Anything-V2 for scene description, object detection, and depth mapping.
Spatial Geometry: A custom estimator to translate 2D data into 3D world coordinates.
Asset Generation: SAM 2 and Hunyuan3D-2 to mask objects and generate .glb 3D meshes.
Reconstruction: A custom Python script to automatically assemble the final scene in Blender.
Challenges we ran into
- Translating 2D pixel coordinates and depth maps into accurate 3D spatial math.
- Orchestrating six distinct AI models into a seamless, automated workflow.
- Handling object scaling and occlusion (calculating the hidden sides of objects). ## Accomplishments that we're proud of
- Our Blender import script, which seamlessly turns a JSON file and raw meshes into a fully populated 3D room.
- Implementing a robust fallback system (TripoSR) to prevent pipeline failures.
- Designing a modular architecture where individual AI models can be easily swapped or upgraded. ## What we learned
- The complex mathematics behind camera intrinsics and focal lengths.
- How to programmatically manipulate .glb files and leverage the Blender Python API.
- That high-quality 3D generation relies heavily on precise 2D segmentation. ## What's next for ImageUnfold
- Adding AI lighting and material estimation for photorealistic rendering.
- Supporting multi-photo inputs to eliminate blind spots on generated 3D objects.
- Building a lightweight web viewer to preview scenes before exporting to Blender.
- implement materials as the 3d model generator was having issues doing that

Log in or sign up for Devpost to join the conversation.