-
-
Persona Builder: Create your own companion
-
Persona Builder: Customize your gaming companion
-
Gaming Hub: Talk with your favorite companion
-
Game Agent
-
Game Agent: "Collect some wood for me"
-
Game Agent: "Build a castle with two towers, red and blue, connected by a bridge"
-
Game Agent: "Build a castle with two towers, red and blue, connected by a bridge"
What it does
Dory is an AI gaming companion platform that lets users create custom AI personas and play games with them via voice.
Here's how it works:
- Create a Persona (Main Gemini 3 usage) — Users go through an interactive flow to build a custom AI character using a web frontend built with Google AI Studio: pick a species, define visual details, choose a name, generate an avatar image (via Gemini), craft a personality, set a gaming style, and select a voice .
- Talk to Your Persona — Users connect via a LiveKit voice pipeline and have real-time voice conversations with their AI persona. The system uses VAD → STT → LLM (Gemini) → TTS to power natural voice interaction, with each persona having its own unique voice and personality.
- Play Games Together — The AI persona can control + reason (LLM Gemini) a Minecraft bot and play alongside the user, staying in-character based on the persona's personality and gaming style by leveraging a memory system.
- Generate Structures (Main Gemini 3 usage + Wow Effect) - The AI persona can generate structures on demand using Gemini 3 as an “architect” that outputs JS code describing block placement in a short output (vs long JSON counterpart)
How we built it
Dory is a six-service monorepo built with TypeScript and Node.js, orchestrated by pnpm workspaces + Turborepo.
Core Architecture — Six Services:
- Web App (apps/web) — A Next.js frontend designed with Google AI Studio with a state machine pattern (StateMachine + WebSocketManager) that seamlessly transitions between three screens: Gatekeeper Chat, Persona Builder, and Gaming Hub.
- Gatekeeper Agent (port 4002) — Routes users via WebSocket to create personas or play games. Powered by Gemini LLM.
- Persona Builder Agent (port 4003) — Interactive persona creation through conversational flow. Uses Gemini 3 for AI avatar/skin generation, Cloudflare R2 for image storage, Prisma + MongoDB for persistence, and ElevenLabs for per-persona voice selection.
- Voice Agent (port 4001) — A LiveKit Agents SDK voice pipeline: Silero VAD → Deepgram Nova 3 STT → LLM (Gemini 3) → ElevenLabs Flash v2.5 TTS. Handles real-time event narration. Syncs conversation memory to the game agent every 60 seconds.
- Game Agent (port 3000) — A Mineflayer powered Minecraft bot with 30+ tool-calling capabilities (movement, collection, crafting, building, combat, vision). Uses a multi-provider LLM client (Gemini 3) for reasoning. Features a multi-step planning engine that decomposes complex requests into plans and re-plans on failure.
Challenges we ran into
Leveraging the LLM to create complex multi-step plans instead of relying on subsequent tool usage, which proved unreliable.
Creating coherent building with spatial awareness by minimizing the output from the generation to use JS codegen instead of raw JSON.
Smart A2A handling with a priority system to handle events coming from the game (critical: death, disconnect, high: take damage, structure finished)
Tailoring system prompts to correctly gather user interests, gameplay tendencies, etc. and having the right amount of impact on gameplay.
Controlling 3 different agents inside the same UI keeping the user on a simple UX, handlgin ws sessions and mixing it with LiveKit for voice + build time custom prompts for personalities
What we learned
It really surprised us how good Google AI studio its for designing UI, testing UX flows that you can later organise in a bigger project by just taking the files.
The power of Gemini 3 to create structures inside Minecraft, we had bad experiences using other less powerful models. you can really see the power of a model with this feature, since it will build a structure from a simple prompt like "Build me a modern house with big windows"
What's next for Dory AI
We genuinely loved every late-night and every "did that just work?!" moment, and every time Dory surprised us with something we didn't explicitly program.
Now we want to put Dory in the hands of real players. The biggest open question for us is: do gamers actually want this? We believe they do, but the only way to know is to let people play with it, break it, and tell us what they wish it could do. We'd love to run an open beta and see how players interact with their AI companions over days and weeks, not just demos.
Beyond Minecraft, we're incredibly excited about in-game asset generation in existing games. Imagine telling your companion "build me a cozy cabin" in Valheim and watching it place logs and thatch piece by piece, or asking for a custom furniture layout in Stardew Valley that matches your farm's aesthetic, or spawning entire themed worlds in Roblox through a voice conversation.
Dory started as a hackathon project, but the vision is much bigger: an open platform where any game developer can drop in an AI companion that truly feels alive. We can't wait to see where it goes from here.
Built With
- aistudio
- gemini
- livekit
- typescript
- vercel-ai-sdk
Log in or sign up for Devpost to join the conversation.