WaveForge: AI-Powered Somatic 3D Design Studio
WaveForge utilizes Google Gemini 2.5 to transform natural language, images, and real-time facial expressions into parametric 3D sculptures ready for fabrication.
The Problem / Inspiration
Traditional CAD (Computer-Aided Design) software is notoriously difficult to learn, creating a barrier between human emotion and physical creation. We wanted to build a "Translator for the Abstract"—a tool that allows anyone to turn a fleeting memory, a specific feeling, or a facial expression into a tangible, physical object.
We were inspired by the concept of "Digital Twins" and "Data Physicalization." What does a "calm ocean" look like mathematically? What is the 3D topology of a happy memory? WaveForge bridges the gap between Generative AI hallucinations and precise, manufacturable 3D geometry.
What it does
WaveForge is a web-based generative design studio that combines complex wave superposition mathematics with the reasoning capabilities of Google Gemini. Text-to-Design (Gemini Powered): Users describe a concept (e.g., "Cyberpunk glitch interference"), and Gemini generates a precise JSON configuration of wave physics (frequencies, amplitudes, phases, colors) to match that description visually. Memory Encapsulation (Multimodal Vision): Users upload a photo. Gemini Vision analyzes the sentiment and visual flow of the image and translates it into a 3D topographical map, effectively "freezing" a memory into a physical shape.
Somatic Control: Using on-device computer vision (MediaPipe), users control the simulation with their face. Smile accelerates the time/speed of the waves. Frown injects noise and turbulence into the geometry. Jaw Open increases the amplitude (height).
Fabrication Ready: The app is not just a visualizer. It runs real-time manufacturing analysis (Overhang Heatmaps via GLSL shaders) and exports Binary STL files for 3D printing, Heightmaps for CNC milling, and SVG slices for laser cutting.
AR Visualization: Users can push a button to view their generated sculpture in their living room using WebXR.
How we built it
The application is a high-performance React application leveraging Three.js (@react-three/fiber) for the 3D engine. Google Gemini API: We used the Gemini 2.5 Flash model for its incredible speed and multimodal capabilities.
Structured Output: We heavily utilized responseSchema to force Gemini to output strict, executable JSON that maps 1:1 with our TypeScript simulation engine interfaces. This prevents "hallucinations" in the code and ensures every AI response results in a valid render.
Vision Capabilities: We stream Base64 image data to Gemini to perform sentiment analysis on user photos, mapping emotional keywords to mathematical constants (e.g., "Anger" = High Frequency, High Roughness).
Shaders: We wrote custom GLSL vertex shaders to handle the wave superposition and noise displacement on the GPU, allowing us to render millions of vertices at 60fps. Computer Vision: We integrated MediaPipe Face Landmarker for real-time, privacy-first (client-side) facial tracking.
WebXR: Implemented for Augmented Reality previewing of the models.
Challenges we ran into
Mapping Language to Math: It was challenging to teach the LLM how "calmness" translates to specific float values in a sine wave equation. We solved this by providing a robust schema and specific range constraints in the system prompt.How facial expressions and body moment changes geometry
Performance: Running a 300k vertex simulation, calculating normals for 3D printing analysis, and running facial recognition simultaneously in the browser required heavy optimization. We moved all geometry calculations to the GPU (Vertex Shader) for rendering, but had to re-implement the math in a CPU worker for the Binary STL export. Race Conditions: Handling the asynchronous nature of AI streaming responses while maintaining a smooth 60fps animation loop.
Accomplishments that we're proud of
The "Somatic" Mode: It feels magical to smile at your computer and watch a 3D sculpture accelerate and smooth out in real-time. It creates a new type of Human-Computer Interaction. Native Binary STL Generation: We built a custom binary STL writer from scratch in TypeScript to ensure we could export high-resolution meshes (up to 100MB) directly in the browser without relying on heavy external libraries.
Seamless AI Integration: The transition from a text prompt to a fully realized 3D model happens in seconds, thanks to Gemini Flash's low latency.
What's next for WaveForge
Direct Printer Integration: Sending G-Code directly from the browser to OctoPrint-enabled 3D printers. Speculative Audio History: We are working on the "Seismograph" mode, where the app listens to a conversation and sculpts the timeline of the discussion into a physical object. Community Gallery: A platform to share "Wave DNA" (QR Codes) so others can remix your AI-generated designs.
Built With google-gemini react three.js webgl glsl mediapipe webxr typescript tailwind Try it out [Link to your hosted demo if you have one]
Built With
- aistudio
- glsl
- google-gemini
- mediapipe
- react
- three.js
- typescript
- webgl
- webxr
Log in or sign up for Devpost to join the conversation.