Inspiration
GeMaster started with a simple frustration: most AI-driven RPG experiences are impressive for a few turns, then begin to drift. They forget the tone they established, contradict their own world rules, or invent details that weaken immersion instead of deepening it.
We wanted to explore a different direction: an AI Game Master built not just for creativity, but for consistency. The goal was to lay the groundwork for a live RPG experience where the world, its rules, its characters, and its narrative logic could stay aligned long enough to feel genuinely playable.
What it does
GeMaster is a live multimodal AI Game Master for roleplaying sessions.
It begins with a Session 0, where the player defines the world, tone, rules, boundaries, and play style. From that, GeMaster builds a playable canon: world context, starter setup, locations, NPC anchors, and a story-ready opening state.
Once play begins, the system acts as a live GM through voice and text, while also generating supporting media such as maps, key NPC visuals, and an opening cinematic. What makes GeMaster distinct is that it is designed around consistency first: it tries to keep the world coherent under live interaction instead of letting the experience dissolve into generic improvisation.
How we built it
We built GeMaster with Gemini at the center of the experience and deployed it on Google Cloud. The project uses Gemini models for live interaction and multimodal generation, and we structured the agent flow with Google's ADK so Session 0, runtime orchestration, and media-linked world creation could work as one system rather than as disconnected features.
The frontend is built with Next.js and TypeScript, and the backend is a TypeScript/Node service deployed on Cloud Run. We use Firestore for session and world state, and Cloud Storage for generated artifacts such as maps, NPC assets, and video-related outputs.
On top of the core stack, we built a custom consistency layer. Session 0 establishes canon. Runtime logic tries to preserve that canon during play. We also separated out-of-character warnings from the GM's in-world narration so the system can enforce constraints without breaking immersion. Throughout development, we tried to shape the product in a way that feels native to Google's ecosystem and aligned with where multimodal agent experiences are heading.
Challenges we ran into
Our hardest problem was not generation quality, but live reliability.
A text demo can hide a lot. A live GM cannot. Real-time turn-taking, speech timing, recovery behavior, media generation pressure, and player input flow all had to feel stable enough to support the illusion of a real session. Small timing issues quickly became experience issues.
The second major challenge was making consistency more than a prompt-level promise. We did not want "consistency" to mean that the model sounded coherent in isolated moments. We wanted the system to stay grounded in the approved world state, respect boundaries, and recover cleanly when the player pushed against the edges of the setting.
We also discovered that scene ownership is surprisingly fragile in live roleplay: the moment a system treats a passing reference to another place or character as an immediate transition, the illusion of a live table breaks and the experience starts to feel like teleportation rather than roleplay.
That pushed us toward a stricter consistency architecture built around explicit scene ownership, transition gating, speaker control, and separate OOC enforcement so the GM could stay in-world without silently teleporting, swapping speakers, or leaking system language into narration.
We also faced a UX challenge: when the system needs to redirect or reject something, that feedback should be clear without damaging the GM's dramatic voice. That led us to separate explicit OOC guidance from narrative delivery.
Accomplishments that we're proud of
We are proud that GeMaster treats consistency as a product feature, not a cosmetic layer.
Session 0 is not just flavor text. It establishes a usable canon that shapes what happens next. During play, GeMaster tries to keep rules, setting logic, character identity, and scene continuity aligned instead of defaulting to vague, drifting improvisation.
We are also proud of the multimodal structure around that core. The project supports live GM interaction alongside world-supporting media such as maps, critical NPC visuals, and an opening scene video that strengthens the sense of place. We are especially proud of the OOC/narrative separation, which lets the system stay clear and rule-aware without breaking the dramatic experience.
Finally, we are proud that this is a real deployed project on Google Cloud with a live interaction path, not a static prototype or edited concept demo.
What we learned
We learned that in live agent experiences, recovery behavior matters almost as much as raw model capability. A system can be impressive in isolated moments and still feel unreliable if turn flow, timing, or feedback breaks under pressure.
We also learned that consistency cannot be added as a final polish step. It has to be built into the structure of the product: Session 0, runtime orchestration, media grounding, and user-facing feedback all need to reinforce the same world model.
And we learned an important product lesson during the final stretch: a narrower but stable demo path is more valuable than a broader but fragile one. In real-time AI systems, cutting risk is part of good engineering.
What's next for GeMaster
In the near term, we want GeMaster to feel less like a promising demo and more like a truly playable system. That starts with hardening the consistency layer while widening the space of possible play: richer template pools for NPCs, locations, and quest structures; stronger runtime canon checks; and a better UI and UX that make live play feel like stepping into a world rather than operating a tool. Alongside that, we want to add more systemic state through relationship and faction graphs, rumor and evidence chains, and event-driven world signals, so player choices shape not only the next reply, but who knows what, who trusts whom, and which opportunities remain available.
Beyond that, we see GeMaster evolving from a narrative consistency layer into a more deterministic runtime model. Today, much of the binding comes from structured canon and runtime enforcement. In the future, we want to explore using code-capable Gemini workflows during Session 0 to generate software-level binders and game logic directly from player-defined rules and constraints. That would move GeMaster closer to a true live AI-native game runtime, rather than a system that relies mainly on narrative agreement.
We also want to turn GeMaster into a platform where players can create, publish, share, and reuse their worlds and Session 0 setups. Long term, we believe this can grow into a broader engine for LLM-native game development: a foundation that becomes more valuable as visual models, code models, asset pipelines, and simulation tools continue to improve.
Our ambition is not only to build a compelling AI Game Master, but to help define the underlying structure of what AI-driven game experiences can become.
Built With
- firestore
- gemini
- generation
- google-adk
- google-cloud
- google-cloud-run
- google-genai-sdk
- multimodal-image
- next.js
- node.js
- react
- realtime-voice-interaction
- typescript
- vertex-ai
Log in or sign up for Devpost to join the conversation.