Dungeons & Dragons Adventure — Gemini Live Agent Demo
Video demo: https://youtu.be/e1zc7FAKn3c GitHub: https://github.com/WilliamK112/dungeons-and-dragons-adventure-voice-version Live app: https://dungeons-and-dragons-adventure-voic.vercel.app
Facial identity and user login, supported by a database with email verification, are new features that will be added to the live app soon. For now, please check them out on GitHub. Click “Quick Start” to skip login/account setup.
About the Project
Dungeons & Dragons Adventure is an interactive storytelling experience that brings the feeling of a live tabletop RPG session to the web.
Players create a party, take turn-based actions, and shape a branching fantasy narrative in real time. Instead of static text generation, the app tracks game context (party state, turn order, objective progress, and event history), so responses feel coherent, reactive, and role-aware.
To make gameplay more tactical and immersive, I implemented an agility-based initiative system (instead of simple round-robin turns), progression with consequences, and objective-complete victory handling tied to actual gameplay. I also added multimodal scene generation and cinematic planning to make the experience more visual and demo-ready.
In short, this project explores how Gemini-powered agents can move beyond one-off prompts into structured, stateful, and replayable interactive experiences.
Inspiration
This project came from two interests: fantasy role-playing and real-time AI interaction.
I wanted to combine the adaptability of a great Dungeon Master with the responsiveness of Gemini, so the AI is not just a chatbot behind the scenes, but the core system driving world progression and narrative reactions.
The goal was to make something that feels like entering a living campaign, not clicking through a fixed script.
What I Built
I built Dungeons & Dragons Adventure — a Creative Storyteller agent that delivers a multimodal narrative experience using Gemini.
The experience includes: Interactive storytelling engine: turn-based D&D-style progression with branching player decisions and evolving narrative state. Multimodal generation pipeline: story text + generated scene imagery integrated into the same gameplay loop, with media tied to the current narrative context. User identity + continuity layer: authenticated accounts (register, verify, login, reset), persistent campaign saves, replay logs, and room/chat support. Face-to-character personalization module(Optional): users can upload or capture a live photo, then Gemini generates game-character visuals that preserve facial identity for more immersive storytelling.
How I Built It
Stack Frontend: React + TypeScript + Vite Backend: Node.js + Express AI: Google GenAI SDK (Gemini) Cloud: Google Cloud Run Hosting: Vercel
System design
The narrative update loop is context-aware:
S_{t+1} = f(I_t, W_t, R_t, C_t)
Where: • I_t: player input • W_t: world state • R_t: character role/turn context • C_t: prompt constraints
This keeps output creative while preserving continuity and gameplay structure.
Challenges I Faced
Maintaining fantasy tone consistency across long interactions Balancing creativity with structured game logic Turning a general-purpose LLM into a believable game-master flow Keeping setup simple while preserving technical depth for judges Building reliability features for real-world API/runtime variability
What I Learned
Prompt engineering works best as system design, not isolated prompts AI UX quality depends on state management + clarity + fallback behavior Visual and interaction polish strongly affects perceived intelligence Interactive storytelling requires constant tradeoff management (freedom vs coherence)
Why This Project Matters This project shows that Gemini-powered applications can go beyond “single prompt in, single answer out” utilities. It demonstrates how AI can power a living interactive product—one that is narrative, visual, social, and stateful over time.
What matters here is not only generation quality, but experience design:
Immersion: Story text, scene visuals, and player decisions are tightly coupled so users feel like they are inside a world, not chatting with a tool. Emotional engagement: Character progression, risk/reward turns, and personalized visuals (including face-to-character transformation) increase emotional connection and player investment. Interactivity at product scale: The app is structured as a real gameplay loop with persistence, replayability, and authenticated user context. Reliability and continuity: Auth flows, campaign saves, resume support, and backend state management make the experience durable across sessions, which is critical for long-form storytelling. Proof of a new AI product category: It points toward AI experiences that behave like games, creative studios, and narrative platforms—not just assistants. In short, this project argues that Gemini can be the core engine for immersive, emotionally resonant, multimodal software products.
Future Improvements Persistent campaign memory
Expand long-horizon memory so decisions from earlier chapters shape later story arcs more deeply. Add memory summarization/compression strategies for long sessions to keep context coherent and cost-efficient. Structured quest/combat mechanics
Introduce richer systems for quests, objectives, conditions, and outcome rules. Improve combat modeling (action economy, status effects, balancing) for clearer strategy and replay value. Deeper party role interactions
Strengthen class/role identity with unique abilities, synergies, and relationship dynamics. Add more role-specific narrative branches and consequence tracking. Voice-based live narration
Integrate more natural real-time voice storytelling and responsive audio delivery. Improve voice direction controls (tone, pacing, character style) for cinematic immersion. Stronger long-session orchestration and state tracking
Build more robust orchestration for extended sessions, including checkpoints, rollback safety, and conflict resolution. Add observability tooling for state transitions, generation events, and error recovery across complex user journeys.
Closing Reflection I started with a simple vision: build an AI Dungeon Master that feels alive. What I ended up building is a working proof of concept that combines narrative intelligence, tactical interaction, and cinematic multimodal presentation in one cohesive experience.
This project became both a technical prototype and a creative statement. It shows that AI products can be more than helpful—they can be immersive, expressive, and emotionally engaging. In other words, AI experiences can be intelligent and magical.
Built With
- api
- cloud
- css
- express.js
- flow
- gemini
- genai
- html
- javascript
- live
- node.js
- react
- run
- sdk
- session
- typescript
- vercel
- vite

Log in or sign up for Devpost to join the conversation.