Structural.ai

Inspiration

For decades, the interface of architecture has remained stagnant. We moved from drafting tables to computer screens, but we remained tethered to mice and keyboards—tools designed for spreadsheets, not the fluid, three-dimensional nature of human space. I always wondered: How long will we be stuck behind a desk? In the age of AI, why can’t we build like we dream?

The inspiration for Structural.ai was to bridge the gap between "sketching" and "engineering" through Spatial Computing. I wanted to turn the architect into a conductor—someone who can stand in a room, "pinch" the air to define a massing concept, and have an AI partner instantly translate those gestures into a viable, engineered structure.

What it does

Structural.ai (ArchGestures) is a real-time generative engineering tool that replaces traditional inputs with computer vision and multimodal AI.

Gesture-Based Massing: Using a simple webcam, architects can use hand gestures to "draw" 3D volumes in space. A pinch gesture defines the scale and position of a conceptual building.

Structural Translation: It doesn't just create shapes; it "understands" them. One click transforms a conceptual box into a full engineering model featuring slabs, columns, shear walls, and facades based on standard commercial grids.

Gemini 3 Consultant: The system feeds geometric data (height, GFA, span, stability) to Gemini 3 Flash, which acts as a Senior Structural Engineer to provide professional feasibility reports and material strategies in seconds.

How we built it

The project is built on a high-performance stack that bridges the physical and digital worlds:

The Brain (Gemini 3 Flash): We chose the new Gemini 3 Flash for its "Pro-grade" reasoning and industry-leading speed. Its multimodal capabilities allow it to analyze complex structural metrics and provide architectural insights with near-zero latency.

The Nerves (MediaPipe): We integrated Google’s MediaPipe Hands to track 21 hand landmarks in real-time, allowing for natural, fluid gesture control without specialized hardware.

The Body (Three.js): A powerful 3D engine handles the rendering of high-fidelity architectural models, simulated physics, and the futuristic HUD interface.

The Architecture: The frontend is a responsive web application utilizing Tailwind CSS for a minimalist, "Iron Man" style dashboard that keeps the focus on the design.

Challenges we ran into

Spatial Mapping: Translating 2D screen coordinates from a webcam into 3D world coordinates in a Three.js scene required complex vector projection. We had to ensure that when a user "pinches" the air, the box appears exactly where they expect it in 3D space.

Real-Time Data Parsing: Formatting raw geometric data into a prompt that an LLM can reason about effectively was a challenge. We had to build a custom "Structural Engine" to calculate spans and areas before sending the data to Gemini for analysis.

Accomplishments that we're proud of

Zero-UI Interaction: We successfully built a functional CAD tool that requires almost no keyboard or mouse usage for the core creative phase.

The "Structural Engine": Moving beyond "AI art" to "AI engineering." Our tool doesn't just generate a picture of a building; it generates a model with 8.0m commercial grids and calculated structural integrity.

Speed: Thanks to Gemini 3 Flash, the structural consultation feels instantaneous, making the design process feel like a true conversation.

What we learned

AI as an Interface: We learned that the next frontier of AI isn't just better models, but better interaction patterns. Moving from text prompts to spatial gestures changes how a designer thinks.

Multimodal Utility: Seeing how Gemini 3 can interpret a list of raw numbers as a cohesive architectural narrative was a massive "Aha!" moment for the potential of agentic coding in niche industries like AEC.

What's next for Structural.ai

AR/VR Integration: Taking the tool out of the browser and into a headset (like Apple Vision Pro or Quest) to allow planners to "sculpt" buildings directly on-site.

Agentic Vision: Leveraging Gemini 3’s Agentic Vision to allow the AI to "look" at a real-world site photo and suggest building masses that align with the surrounding urban fabric.

Real-Time Sustainability: Integrating Carbon Footprint analysis into the Gemini prompt to ensure every design is as green as it is stable.

Built With

gemini
generativeai
google
html5
javascript
mediapipe
three.js

Updates

Talha S started this project — Feb 05, 2026 11:43 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.