Inspiration
I love watching Critical Role, which is a long running youtube series in which long D&D campaigns are played, I noticed that keeping track of combat events is hard in the moment and no fun in the heat of the moment, so i wanted to make something to lift that load. That was the core idea and inspiration.
What it does
It helps create and organize D&D campaigns and characters, but the main flesh on the bones is the realtime audio analysis which picks up key phrases related to D&D events such as damage, healing,initiative, spell casting etc, and create an event history along with self managed state so that neither players or the DM have to track this on their own anymore. It a voice assistant to aid in any questions.
How I built it
The app is built using JS/React on the frontend with a python backend All audio is transcribed using Eleven Labs Realtime Scribe V2 All narration is done by Eleven Labs Text-to-Speech api Transcripts are analyzed by gemini to identify and create 'events' Image creation for campaign art and character art is done using nano banana
Challenges I ran into
The hardest challenges came up when dealing with the large stream of ongoing transcription that occurs when D&D sessions are on going. Often I ran into duplication issues for events, incorrect transcription leading to missed events, odd character/spell names not being picked up, and even how often to analyze in the backend was a challenge to get a reasonable context for analysis without losing the realtime aspect. I was able to solve these with an approach of a buffer to check for dupicates being made and layered analysis including a transcription correction step before analysis that would take into account the users characters/D&D spells to correct any information that may seem incorrect. I went with a 25 second period of audio transcription before sending to the backend.
Accomplishments that I am proud of
Proud that I was able to put together multiple AI models for a cohesive solution to an area I'm passionate about. I am especially proud of the real-time analysis and event handling functionality that I created which is made to be expanded upon to create an even wider array of events that can happen in a D&D session
What I learned
I learned a lot about what Eleven Labs has to offer in terms of audio analysis and speech which I found quite powerful and useful. Also I gained an indepth understanding of the Google Cloud Platform as it was used in many areas from analysis,Image storage,database/authentication, and even deployment.
What's next for PickAxe
Next I want to extend the functionality to include the following for a more expansive experience -Monster/NPC creation -Item Inventory -Map Creation and analysis during sessions -Experience point tracking and Levelling up
Built With
- cloudrun
- cloudstore
- elevenlabstts
- firestore
- gemini
- javascript
- nanobanana
- python
- react
- realtimescribev2
Log in or sign up for Devpost to join the conversation.