GenCAD: Extending AI Capabilities to the Physical World
Gemini Integration in GenCAD
GenCAD leverages the cutting-edge capabilities of Gemini 3 to power its autonomous 3D modeling agent. By integrating the official google-genai SDK, the application accesses Google's latest experimental models—specifically Gemini 3 Pro Preview —to handle the complex logic required for programmatic 3D design.
Key Gemini 3 features central to this application include:
Advanced Intelligence: This integration powers an autonomous agent capable of utilizing Gemini's advanced intelligence to reason through complex spatial logic and Constructive Solid Geometry (CSG) operations, ensuring precise OpenSCAD script generation.
Multimodal Visual Validation: A core feature of the validation loop is Gemini's vision capability. The agent captures screenshots of the rendered 3D models and feeds them back into the Gemini 3 model (via Base64 image encoding) to visually verify that the output matches the user's request, enabling a self-correcting design workflow.
High-Performance Async Architecture: The implementation uses the SDK's asynchronous features to handle long-running generation tasks efficiently, ensuring the UI remains responsive while the model processes complex geometry instructions.
Long Context Window: Because of the need to analyze customer requirement, exist OpenSCAD model, rendering log, screenshot of rendered model. The long context window provided by Gemini is essential for the project.
This deep integration allows GenCAD to move beyond simple text generation, acting as a true multimodal design assistant that can reason, see, and iterate on its creations.
Background
I have always had a dream of owning a 3D printer since college, but the high price and maintenance threshold kept me away. In recent years, the 3D printer industry has developed rapidly, with significant improvements in affordability and ease of use. I finally decided to buy one during Thanksgiving in 2024.
The Idea
After I started using the 3D printer, I was printing other people's models, but gradually found that they didn't always perfectly fit my needs, so I had the idea of modeling myself. However, the last time I did modeling was 10 years ago, and my skills were rusty.
At this time, I discovered modern AI coding tools, and I had the idea of using AI to generate models. After experiencing most AI modeling tools on the market, I found that they almost all focus on "Image to 3D Model" functions, which are very suitable for generating figures and toys. But there is a large category of models that is completely ignored—Parametric Models.
What is a Parametric Model?
Parametric modeling is a way of defining geometry through code and parameters. Unlike traditional "What You See Is What You Get" modeling, parametric models use variables and mathematical expressions to describe the size, position, and shape of objects.
For a simple example, if you want to build a storage box, traditional modeling requires manually dragging the position of each face; while parametric modeling only needs to define a few parameters:
Length = 100mm
Width = 80mm
Height = 50mm
Wall Thickness = 2mm
The model is automatically generated based on these parameters. Want a larger box? Just modify the parameter values, and the model updates immediately.
Advantages of Parametric Models
Compared to Image-to-3D or manual modeling, parametric models have several unique advantages:
Easy to modify and customize: Changing a parameter can generate variants of different sizes without re-modeling. For example, designing a phone stand, changing to a new phone only requires adjusting the size parameters to adapt.
Precise and controllable: Each dimension is a precise value, very suitable for functional parts that need to fit with other objects, such as screw holes, snaps, casings, etc.
Reusable and shareable: Parametric models are essentially code, which can be easily shared with others. Others can adjust the parameters according to their own needs after receiving it, rather than just printing it as is.
Version control friendly: As text files, they can be managed with tools like Git to track every modification.
Suitable for batch generation: By batch modifying parameters through scripts, a series of related models can be quickly generated.
This type of model is very common in the 3D printing community. Practical items like storage boxes, stands, casings, connectors, etc., are all very suitable for parametric modeling.
So, I decided to develop a parametric AI modeling tool myself.
Initial Attempts
Model File Format Selection
The first step in developing the tool was selecting a modeling format. There are two main choices on the market:
- Scripting interfaces of existing 3D modeling software: Such as Fusion 360 and FreeCAD, which support modeling using Python scripts.
- Native modeling languages oriented towards programming: Represented by OpenSCAD.
After in-depth discussion with AI, I found that the first category of tools has a common problem: they are based on traditional GUI modeling logic and need to work with GUI-style operations through object interfaces, which is not friendly to AI.
In addition, the most common 3D model sharing websites (such as MakerWorld) support direct sharing of OpenSCAD files, while other formats can only be converted to uneditable STL files. After comprehensive consideration, I decided to use OpenSCAD as the target format.
Tool Selection
During the first attempt, I naturally didn't want to reinvent the wheel, so I chose to use AI coding assistants to create models. But in the process of using them, found several problems:
- The models generated were often unsatisfactory on the first try.
- There was a lack of sufficient tools to verify whether the model was correct.
To solve this problem, I developed an MCP to help verify model quality.
However, after connecting the MCP, I encountered new problems:
- Uncontrollable calls: Even if emphasized repeatedly in the Prompt to use MCP to verify the model, it often announced the task was completed directly after modeling.
- Technical difficulties in screenshot validation: Due to the lack of a frontend visualization canvas, I encountered many technical obstacles when adjusting the camera and verifying the model from different angles through screenshots.
Ultimately, the effect of this path was limited.
Agent Solution
I realized that:
- The entire modeling process is relatively fixed and suitable for management by an Agent.
- Controlling the Agent architecture myself makes it more controllable.
- A large number of 3D modeling users are not programmers, and letting them deploy local environments has a certain threshold.
- An Agent with a UI interface can provide dual improvements in user experience and technical feasibility.
So, I decided to build an Agent system.
Workflow
As shown in the demo, the Agent's workflow is as follows:
- Requirement Analysis: Parse the user's natural language requirements.
- Documentation Generation: Generate parametric design documents.
- Initial Modeling: Generate OpenSCAD models based on documents.
- Visual Validation: Render the model and take screenshots, sending them to AI for multi-angle verification.
- Iterative Correction: Repeatedly adjust based on verification results until legal code meeting the requirements is generated.
Log in or sign up for Devpost to join the conversation.