Inspiration
The genesis of CLIXOR lies in the friction between creative vision and technical execution. 3D animation has traditionally been gated by the steep learning curve of software like Blender. We envisioned a world where a director could "speak" a scene into existence, using Gemini as a bridge. Our goal was to create a tool that understands the intent behind a prompt—recognizing that a request for "dramatic lighting" implies a specific configuration of $f$-stops, light intensities, and volumetric scattering.
What it does
CLIXOR is an AI-driven 3D Animation Studio that acts as a Technical Artist for any creator.
- Prompt to Scene: Users provide natural language descriptions to generate complex 3D models and environments.
- Vision-to-3D: By uploading an image, the system utilizes Gemini's multi-modal capabilities to analyze silhouettes and colors, then "sculpts" them in Blender using metaballs and remeshing.
- Live Studio Dashboard: A real-time interface displays the agent's internal reasoning (Thoughts), the generated Python code, and a live Scene Context that tracks every object in the 3D viewport.
How we built it
The project is architected as a modular three-tier intelligence pipeline:
- The Brain (FastAPI Orchestrator): Interfacing with the gemini-3-pro-preview model, this layer handles multi-modal inputs and maintains a "Scene Memory."
- The Nervous System (Blender Bridge): A specialized MCP server running inside Blender's process. It uses a thread-safe execution loop to prevent UI blocking.
- The Studio (React Dashboard): A bubbly, "Pixar-esque" UI built with Vite, designed to display the agent's internal "Thought" stream and the live "Scene Context" graph.
Challenges we ran into
The primary hurdle was the Synchronicity Problem. Because Blender’s Python API (bpy) is strictly single-threaded, any external call would normally freeze or crash the application. We overcame this by implementing a polling-based timer system that checks for new commands without interrupting the user's manual workflow. Additionally, we faced API Drift. With the release of Blender 5.0, many shader attributes were renamed (e.g., Transmission became Transmission Weight). We had to build a Heuristic Repair Layer that intercepts agent code and automatically remaps these attributes using regex:
Accomplishments that we're proud of
- Hyper-3D Generation: Successfully implementing a "Volume Sculpting" strategy where the AI uses a cloud of metaballs to create organic shapes rather than just simple primitives.
- Zero-Lag Integration: Creating a bridge that allows Blender to remain fully interactive while the AI is actively modifying the scene.
What we learned
Developing CLIXOR was a masterclass in Spatial Reasoning within LLMs. We learned that:
- Stateful Feedback is Critical: An agent cannot animate effectively in a vacuum. We implemented a "State Observer" that translates Blender's scene graph into a JSON schema.
- Multi-modal Depth Perception: Gemini 3 is remarkably capable of estimating 3D volumes from 2D pixels, which we leveraged to automate the blocking phase of 3D modeling.
What's next for CLIXOR
The future of CLIXOR is Multi-Agent Orchestration. We plan to introduce a "Director Agent" and a "Vision QA Agent" that work in tandem. The Director will plan the shots, while the QA Agent will "look" at the final render and provide feedback to the Technical Artist to fix lighting or clipping issues. We also aim to support full timeline animation, allowing users to describe entire sequences:
Log in or sign up for Devpost to join the conversation.