Dungeons & Dragons Adventure — Gemini Live Agent

Front page
Api setting page
Character setting page
Main Story Teller
Facial Identity Character
Demo character produce-with no Facial Identity

Dungeons & Dragons Adventure — Gemini Live Agent Demo

Video demo: https://youtu.be/e1zc7FAKn3c GitHub: https://github.com/WilliamK112/dungeons-and-dragons-adventure-voice-version Live app: https://dungeons-and-dragons-adventure-voic.vercel.app

Facial identity and user login, supported by a database with email verification, are new features that will be added to the live app soon. For now, please check them out on GitHub. Click “Quick Start” to skip login/account setup.

About the Project

Dungeons & Dragons Adventure is an interactive storytelling experience that brings the feeling of a live tabletop RPG session to the web.

Players create a party, take turn-based actions, and shape a branching fantasy narrative in real time. Instead of static text generation, the app tracks game context (party state, turn order, objective progress, and event history), so responses feel coherent, reactive, and role-aware.

To make gameplay more tactical and immersive, I implemented an agility-based initiative system (instead of simple round-robin turns), progression with consequences, and objective-complete victory handling tied to actual gameplay. I also added multimodal scene generation and cinematic planning to make the experience more visual and demo-ready.

In short, this project explores how Gemini-powered agents can move beyond one-off prompts into structured, stateful, and replayable interactive experiences.

Inspiration

This project came from two interests: fantasy role-playing and real-time AI interaction.

I wanted to combine the adaptability of a great Dungeon Master with the responsiveness of Gemini, so the AI is not just a chatbot behind the scenes, but the core system driving world progression and narrative reactions.

The goal was to make something that feels like entering a living campaign, not clicking through a fixed script.

What I Built

I built Dungeons & Dragons Adventure — a Creative Storyteller agent that delivers a multimodal narrative experience using Gemini.

The experience includes: Interactive storytelling engine: turn-based D&D-style progression with branching player decisions and evolving narrative state. Multimodal generation pipeline: story text + generated scene imagery integrated into the same gameplay loop, with media tied to the current narrative context. User identity + continuity layer: authenticated accounts (register, verify, login, reset), persistent campaign saves, replay logs, and room/chat support. Face-to-character personalization module(Optional): users can upload or capture a live photo, then Gemini generates game-character visuals that preserve facial identity for more immersive storytelling.

How I Built It

Stack Frontend: React + TypeScript + Vite Backend: Node.js + Express AI: Google GenAI SDK (Gemini) Cloud: Google Cloud Run Hosting: Vercel

System design

The narrative update loop is context-aware:

S_{t+1} = f(I_t, W_t, R_t, C_t)

Where: • I_t: player input • W_t: world state • R_t: character role/turn context • C_t: prompt constraints

This keeps output creative while preserving continuity and gameplay structure.

Challenges I Faced

Maintaining fantasy tone consistency across long interactions Balancing creativity with structured game logic Turning a general-purpose LLM into a believable game-master flow Keeping setup simple while preserving technical depth for judges Building reliability features for real-world API/runtime variability

What I Learned

Prompt engineering works best as system design, not isolated prompts AI UX quality depends on state management + clarity + fallback behavior Visual and interaction polish strongly affects perceived intelligence Interactive storytelling requires constant tradeoff management (freedom vs coherence)

Why This Project Matters This project shows that Gemini-powered applications can go beyond “single prompt in, single answer out” utilities. It demonstrates how AI can power a living interactive product—one that is narrative, visual, social, and stateful over time.

What matters here is not only generation quality, but experience design:

Immersion: Story text, scene visuals, and player decisions are tightly coupled so users feel like they are inside a world, not chatting with a tool. Emotional engagement: Character progression, risk/reward turns, and personalized visuals (including face-to-character transformation) increase emotional connection and player investment. Interactivity at product scale: The app is structured as a real gameplay loop with persistence, replayability, and authenticated user context. Reliability and continuity: Auth flows, campaign saves, resume support, and backend state management make the experience durable across sessions, which is critical for long-form storytelling. Proof of a new AI product category: It points toward AI experiences that behave like games, creative studios, and narrative platforms—not just assistants. In short, this project argues that Gemini can be the core engine for immersive, emotionally resonant, multimodal software products.

Future Improvements Persistent campaign memory

Expand long-horizon memory so decisions from earlier chapters shape later story arcs more deeply. Add memory summarization/compression strategies for long sessions to keep context coherent and cost-efficient. Structured quest/combat mechanics

Introduce richer systems for quests, objectives, conditions, and outcome rules. Improve combat modeling (action economy, status effects, balancing) for clearer strategy and replay value. Deeper party role interactions

Strengthen class/role identity with unique abilities, synergies, and relationship dynamics. Add more role-specific narrative branches and consequence tracking. Voice-based live narration

Integrate more natural real-time voice storytelling and responsive audio delivery. Improve voice direction controls (tone, pacing, character style) for cinematic immersion. Stronger long-session orchestration and state tracking

Build more robust orchestration for extended sessions, including checkpoints, rollback safety, and conflict resolution. Add observability tooling for state transitions, generation events, and error recovery across complex user journeys.

Closing Reflection I started with a simple vision: build an AI Dungeon Master that feels alive. What I ended up building is a working proof of concept that combines narrative intelligence, tactical interaction, and cinematic multimodal presentation in one cohesive experience.

This project became both a technical prototype and a creative statement. It shows that AI products can be more than helpful—they can be immersive, expressive, and emotionally engaging. In other words, AI experiences can be intelligent and magical.

Built With

api
cloud
css
express.js
flow
gemini
genai
google
html
javascript
live
node.js
react
run
sdk
session
typescript
vercel
vite

Submitted to

Gemini Live Agent Challenge

Created by

I was the full-stack contributor for this project and owned the end-to-end build: product design, frontend UX (React/TypeScript), backend APIs (Node.js/Express), Gemini integration via Google GenAI SDK, cloud deployment (Google Cloud Run), production hosting (Vercel), plus demo/video/docs and final submission packaging.
I also implemented core gameplay systems (turn flow, agility-based initiative, objective-completion victory logic), multimodal experience features, and reliability/observability improvements (preflight checks, fallback handling, and diagnostics).

Ching-Wei Kang
Computer science & Data Science 27er @ UW-Madison | Dean List | Data Competition Winner | Social Media Leader

Updates

Ching-Wei Kang started this project — Mar 15, 2026 03:08 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.