Balloon: Agentic NPC Generator

About the project

Balloon is an agentic NPC generator that brings text prompts to life in an immersive WebXR environment. With a single prompt (e.g., "A grumpy dwarf blacksmith named Bjorn"), Balloon leverages AI to procedurally generate a 3D character model and fully-voiced dialogue lines, seamlessly dropping them into a VR scene you can walk around in.

There are no pre-baked models or scripted dialogue trees. Every character is sculpted from scratch using primitive mesh commands, and every spoken line is generated dynamically on the fly based on the character's designated personality.

What Inspired Balloon

The inspiration came from the classic bottleneck of game development and XR design: populating a world. Creating characters usually involves hours of 3D modeling, rigging, writing dialogue, and recording voice actors. We wanted to see what would happen if we reduced that entirely to a single text prompt. The magic of asking an AI for a character and, 30 seconds later, having them stand in front of you talking in spatial audio felt like the inevitable future of immersive storytelling.

How it was Built

Balloon was built with a strict focus on a lean, fast, and fully local agent loop (running on an M4 Mac):

  • Frontend: A lightweight, single-page application using Vanilla JS and A-Frame. It handles WebXR rendering, deep linking for saved characters, and spatial 3D audio playback without relying on heavy frameworks like React.
  • Backend: A FastAPI server that acts as the orchestrator.
  • The Agent Loop: We used the Anthropic SDK with Claude as the brain of the operation. Claude is given two tools:
    1. build_3d_model: Claude writes a Python script using the Blender API (bpy). The backend runs headless Blender to execute this script, combining primitives (spheres, cubes, cylinders) into a .glb model.
    2. synthesize_voice_lines: Claude writes dialogue (greeting, idle, farewell) and calls the ElevenLabs API to synthesize the lines using a curated voice palette that matches the generated character's traits.

Challenges Faced

  • Constraining the 3D Generation: Getting an LLM to reliably write valid bpy code was the biggest hurdle. We had to strictly constrain Claude's system prompt to only use primitive mesh operations (bpy.ops.mesh.primitive_*), enforce a Y-up orientation with feet at the origin, and restrict random imports.
  • Handling Hardware / File System Execution: Running a headless version of Blender synchronously required careful file system management. We implemented a dynamically incrementing nested folder structure (e.g., 01-model, 02-model) to keep 3D models and their associated MP3s cleanly grouped.
  • Browser Autoplay Policies: Browsers strictly block autoplaying audio on page load. To prevent silent failures in VR, we had to rethink our audio strategy. We ultimately removed auto-playing greetings on load and rendered interactive buttons to trigger audio, ensuring seamless playback once the user interacted with the UI.

What I Learned

Through building Balloon, I learned that strict constraints actually make LLMs more creative and reliable. By forcing Claude to only use geometric primitives, it had to think abstractly to "sculpt" characters. Additionally, I learned a great deal about WebXR (A-Frame) spatial audio mechanics, and how to effectively coordinate parallel generation tasks (voice generation alongside 3D generation) inside a single agent tool-use loop.

Built With

Share this project:

Updates