Parallax: AI-Powered 3D World Building
3D content creation has always been the domain of specialists. I asked myself: What if anyone could create 3D worlds just by describing them?
The vision was simple but ambitious: a tool where you could upload a photo of your living room and instantly see it recreated in 3D, then refine it by simply saying "make it cyberpunk" or "add a bookshelf by the window." With Gemini 3's multimodal capabilities and structured output, this is now a reality.
🚀 What Parallax Does
Parallax is an AI-powered 3D scene generator and editor with two core modes:
Generator Mode: Upload any image (a bedroom, office, outdoor scene) and Parallax analyzes it using Gemini 3's vision capabilities. It identifies objects, estimates spatial relationships, and generates a complete 3D scene using Three.js primitives. For complex scenes, it intelligently chunks the generation through a Planner/Builder flow, maintaining coherence while scaling.
Editor Mode (Refinement): Modify existing scenes through natural language. Say "make the chair metallic" or "add warm lighting" and watch changes happen in real-time. The AI understands materials, transforms, colors, and abstract concepts like "cyberpunk style." Unlike basic generators, Parallax merges changes into the existing scene rather than replacing it.
✨ Key Features
- Modular Orchestrator Pattern: Separates strategic planning (Architect) from tactical building (Carpenter).
- Ghosting UI: Immediate feedback on the scene plan before construction begins, making the process feel interactive and transparent.
- Multimodal Spatial Reasoning: Processes images + text simultaneously to understand layout, scale, and style.
- Thought Signature Continuity: Uses Gemini's reasoning signatures to maintain logic and style consistent across multiple generation steps.
- Intelligent Merging: Refine scenes iteratively; the system knows which objects to update and which to keep.
- Real-time Preview: Interactive Three.js viewport with direct object manipulation.
🛠️ How it was Built
Technical Stack
- Core Engine: React 18 + TypeScript.
- 3D Rendering: Three.js with
@react-three/fiberfor a modern, component-driven viewport. - AI Intelligence: Gemini 3 Pro via the
@google/genaiSDK. - Styling: Vanilla CSS + Tailwind for a premium "Glassmorphism" UI.
The Generation Pipeline (Orchestrator Pattern)
- Planning Phase (The Architect): Gemini analyzes the prompt/image and creates a
SceneBlueprint. This contains the strategic layout, theme, and object descriptions without direct coordinates. - Building Phase (The Carpenter): The Builder takes the blueprint and materializes objects in chunks. It focuses on precise 3D math (position, rotation, scale) and material properties (metalness, roughness, emission).
- Scene Synthesis: The React layer merges these objects into the state, updating existing IDs or appending new ones as needed.
Reasoning Continuity
By leveraging Thought Signatures, the Builder phase maintains the "contextual memory" of the Architect's reasoning, ensuring that if we generate the scene in separate parts, the style and scale remain flawlessly aligned.
🧠 Challenges and Breakthroughs
1. The Token Limit Wall
Generating 20+ precise 3D objects in one shot often lead to JSON truncation. Solution: Advanced chunking. By breaking the generation into a Blueprint -> Chunk materialization flow, we can build virtually unlimited-sized worlds.
2. Strategic Refinement vs. Full Reset
Early iterations would clear the whole scene for a simple color change.
Solution: Implemented an updateSceneContext hook and intelligent merging logic that allows Gemini to see exactly what's currently on the canvas and modify only the delta.
3. The "Box" Problem
Initially, everything looked like basic cubes. Solution: Complex object synthesis. Taught the AI to compose "Compound Objects" from primitives—legs, seats, and backs for chairs—making a scene feel like a world rather than a prototype.
🔮 What's Next
- Texture Synthesis: Applying AI-generated textures for wood grain, fabric, and stone.
- AR Viewport: Visualizing your generated scenes in your physical room using mobile AR.
- Collaborative Editing: Multi-user world-building through shared conversation.
The ultimate vision: Parallax becomes the natural language interface for 3D creation, lowering the barrier from "years of training" to "just describe what you want."
Log in or sign up for Devpost to join the conversation.