DM Assistant

Screenshot of Goals and Tools

Inspiration

Being a dungeon master is hard work. You prepare all the maps, books, and characters, and then you have to conduct the session, responding to players' actions, enforcing referee rules, and engaging the players in the story. After that, you must keep copious notes on what happened in the session to ensure the new party pet or favorite NPC can reappear later. It’s a lot, but with these LLM tools, we can help the tired dungeon master. “It's me, I’m the tired dungeon master.”

What it does

Our agent takes a recording of just the audio from a tabletop session (be it DnD, Pathfinder, Blades in the Dark, etc.) and identifies the critical moments of play: the intrigue of a character sneaking a bribe into a watchman’s pocket or the failure of a character falling off a ledge and into a shark-filled lake. It tracks the NPCs the players meet along with their motivation and context and even denotes lore that the players are exposed to. It returns this to the DM as a text summary and in an intuitive graph format, allowing for a living database that evolves from session to session.

How we built it

We built it using a host of Gemini agents. The first takes the transcribed audio (which can be generated from Gemini or another tool like assembly AI) and breaks it into chapters as though it were a movie. The summary is returned to the DM as a text file. These chapters are then fed into another agent, which identifies the relevant information from each section to represent in the database. The model is given a snapshot of the current state of the DB (taking advantage of the large context window) before making its decisions.

Challenges we ran into

Getting the agent to talk to its tools was a little complicated. The data stores took a while to cache, so that the model couldn’t retrieve the data for the initial step. We were able to show it works on ai.google.studio though.

Accomplishments that we're proud of

The summary is detailed and recaps the entire session with excellent resolution. The graph format is intuitive and opens the door for expansion.

What we learned

Google Agent Builder allows for a lot of customization and testing. The flexible tool system can be easily slotted into an API workflow. I love the prompt history and comparison methods.