Inspiration
Every designer knows the frustration: you have a vision in your head, but translating it into 3D software means hours of clicking through menus, adjusting parameters, and fighting with interfaces that were designed for mice and keyboards. We asked ourselves — what if you could just speak your design into existence? What if you could stand in your living room, look at an empty corner, and say "put a walnut bookshelf there" and watch it appear in real space?
Dio Design was born from the belief that the next generation of design tools won't live on a flat screen. They'll live in the space around you, respond to your voice, and be guided by an AI companion that understands what you're trying to create. We named our AI avatar "Dio" — a star-shaped character inspired by Disney's Wish — because great design starts with a wish.
What it does
Dio Design is a voice-controlled mixed-reality 3D design workspace. You wear a custom 3D-printed VR headset, hold a wireless controller, and speak naturally to Dio — your AI design companion who floats beside you in augmented reality.
Voice-Driven 3D Creation: Say "create a modern desk with tapered legs" and watch it materialize in AR space in front of you. Say "make it marble" and the material transforms instantly. Say "rotate it 45 degrees" and it spins. Every voice command is processed by Llama-3.3-70B running on Qualcomm Cloud AI 100 Ultra, which generates three.js code executed live in the AR scene.
Physical Controller: A custom-built wireless controller (Arduino UNO Q with Qualcomm Dragonwing processor) gives you physical controls — push-to-talk for voice commands, a joystick for scaling objects, and buttons for pick/select and undo. It communicates over WiFi UDP to the orchestration hub.
Designer Dashboard: The AI PC (Snapdragon X Elite) runs a real-time designer dashboard showing a complete version history of every edit, a live 3D preview with orbit controls, and export options. Every voice command and scene state is logged, versioned, and recoverable. Click any version to jump back to that state. Export any version as GLB for use in Blender or other professional tools.
Dio Avatar: Dio isn't just a static icon — it's a fully procedural animated character with emotional states, blinking, eye tracking, personality quirks, and a completion celebration orbit. When you speak, Dio leans in and glows brighter. When thinking, Dio's energy shifts. When your command succeeds, Dio does a little victory lap around your creation. It makes the design process feel collaborative, not transactional.
How we built it
Four Qualcomm platforms working in concert:
Qualcomm Cloud AI 100 Ultra (Cirrascale Inference Cloud) — Powers all AI inference. Llama-3.3-70B processes voice commands and generates three.js code. The same infrastructure hosts Stable Diffusion (SDXL Turbo) for AI-generated textures and BAAI/bge embeddings. All inference runs on purpose-built AI silicon, not general-purpose GPUs.
Snapdragon X Elite AI PC (Copilot+ PC) — The orchestration hub. Runs a FastAPI server that routes voice commands to the cloud, manages scene version history, serves the designer dashboard, and relays controller input. It's the nervous system connecting every device.
Snapdragon 8 Elite (Samsung Galaxy S25 Ultra) — The viewport. Runs a WebXR application in Chrome that renders the full 3D scene, executes LLM-generated three.js code in real-time, captures voice via Web Speech API, and displays the Dio avatar. Supports both handheld AR mode (with hit-test surface placement) and stereo VR mode (with camera passthrough for the headset).
Qualcomm Dragonwing QRB2210 (Arduino UNO Q) — The physical controller. An STM32U585 MCU reads a joystick (KY-023), three buttons, and an MPU-6500 IMU at 50Hz. The Linux side runs a Python UDP sender that forwards sensor data wirelessly to the hub.
The pipeline: $$\text{Voice} \xrightarrow{\text{WebSocket}} \text{Hub} \xrightarrow{\text{REST}} \text{Cloud AI 100} \xrightarrow{\text{three.js}} \text{Hub} \xrightarrow{\text{WebSocket}} \text{S25 AR}$$
Custom hardware: We 3D-printed a VR headset (modified Secondsight open-source design, re-parameterized for the S25 Ultra's dimensions and 34mm/45mm biconvex PMMA lenses) and a controller enclosure ("Diora's Box" — an 18cm × 6cm × 7cm two-piece box with precision-cut holes for the joystick and buttons).
Software stack: FastAPI + httpx (hub server), three.js r162 with WebXR (AR viewer), vanilla HTML/CSS/JS (dashboard), Arduino/Zephyr (firmware), Python (UDP sender). No frameworks, no build tools, no npm — everything runs from single files.
Challenges we ran into
Voice recognition inside WebXR sessions. Web Speech API silently fails when started after an immersive XR session begins — Chrome Android requires speech recognition to be initiated from a user gesture, and the gesture context is lost during the async XR session setup. We solved this by starting recognition before requestSession() in the same click handler, and later implemented push-to-talk via the hardware controller to bypass the issue entirely.
Windows ARM compatibility on the AI PC. The Snapdragon X Elite Copilot+ PC runs Windows on ARM, which broke aiohttp (requires C compilation). We pivoted to httpx — a pure Python HTTP client that works everywhere without compilation.
Locked-down development environment. The AI PC had no sudo access, making it impossible to install system packages or run Blender. This forced a major architectural pivot — we eliminated Blender entirely and moved all 3D rendering to three.js executing directly in the AR viewer. This turned out to be a better architecture: faster (no export pipeline), simpler (fewer moving parts), and more responsive (changes appear instantly).
LLM model quality. Our initial approach used Llama-3.1-8B, which produced poor three.js geometry — flat circles instead of spheres, malformed multi-part objects. Switching to Llama-3.3-70B on the same Qualcomm Cloud AI 100 infrastructure dramatically improved output quality, demonstrating the importance of model scale for spatial reasoning tasks.
Serial port discovery on UNO Q. The Arduino UNO Q's dual-processor architecture (Dragonwing Linux + STM32 MCU) made serial communication non-trivial. The MCU-to-Linux serial path isn't the standard /dev/ttyACM0 — finding the correct device path required probing multiple UART interfaces via ADB.
Accomplishments that we're proud of
End-to-end voice-to-AR in under 4 seconds. You speak, Dio thinks, and a 3D object appears in augmented reality — all powered by Qualcomm silicon at every step of the chain.
The Dio avatar. A fully procedural animated character with emotional blending, organic non-repeating motion (layered sine waves), spring-physics squash-and-stretch, random personality quirks, and voice-reactive behavior. It makes AI feel alive and collaborative, not robotic.
The version control system. Every single design edit is saved as a versioned scene state with the voice command that created it. The designer can scrub through their entire creative history, jump to any previous state, and export any version. This is how professional design tools should work.
True multi-device orchestration. Four different Qualcomm chips communicating in real-time over three different protocols (WebSocket, REST API, UDP) — and it all works together seamlessly.
Custom 3D-printed hardware. A functional VR headset and controller enclosure, designed parametrically in OpenSCAD and printed during the hackathon.
What we learned
Edge-cloud hybrid AI is the future. Not every AI task needs a cloud roundtrip. Simple commands (scale, rotate, change color) could run on the Snapdragon X Elite's NPU locally with near-zero latency, while complex generative tasks go to the Cloud AI 100. This tiered approach minimizes latency for common operations while preserving capability for hard ones.
LLMs are better at describing than coding. Asking a language model to write 30 lines of correct three.js geometry is unreliable. Asking it to output structured JSON describing what to create is much more reliable. The lesson: let LLMs reason about intent, let deterministic code handle execution.
The best architecture is the simplest one that works. We started with Blender, MCP servers, OpenClaw, and a complex multi-stage pipeline. We ended with a single Python file, a single HTML file, and direct API calls. The simpler system was faster to build, easier to debug, and more responsive in the demo.
Voice is a natural interface for spatial design. Describing what you want ("a warm, cozy living room with a walnut coffee table") is faster and more intuitive than navigating menus and dragging vertices. The barrier isn't the technology — it's building an AI that understands spatial intent.
What's next for Dio Design
LoRA fine-tuning for spatial design. Fine-tune a specialized model on the Qualcomm Cloud AI 100 using a curated dataset of design commands paired with high-quality three.js outputs. This would dramatically improve the accuracy and detail of generated 3D objects.
AI-generated textures. Stable Diffusion (SDXL Turbo) is already available on our Cirrascale infrastructure. The next step is generating photorealistic PBR textures on demand — say "make it look like weathered oak" and an AI-generated wood texture is applied in real-time.
On-device inference with Snapdragon NPU. Run a quantized model locally on the AI PC's NPU for instant-response commands, falling back to the Cloud AI 100 for complex generative tasks. True edge-cloud hybrid intelligence.
Multi-user collaboration. Multiple headsets viewing and editing the same scene simultaneously, with Dio mediating conflicts and suggesting compositions.
Export to professional tools. Full glTF/GLB export with PBR materials, animations, and scene hierarchy — ready to import into Blender, Unity, or Unreal Engine for production refinement.
Push-to-talk hardware controller. Complete integration of the Arduino UNO Q wireless controller for fully hands-free, screen-free VR design — the designer never needs to touch the phone.
Log in or sign up for Devpost to join the conversation.