Valkyrie - Voice Activated LLM Keypress Engine
We decided to build a Voice Activated LLM-based Keypresser for people who struggle to press keys on time in fast paced scenarios like work or games. While a lot of voice activated controls exist, most of them require you to say the exact key you want to activate, or require manual mapping. In this case, we use an LLM (GPT-4o) to automate the mapping process, simplifying things for the end user.
Voice commands are provided to the LLM, along with a frame grab of the current application. The LLM processes this information, then provides the exact keypresses that are required to execute the action in the game.
We have two modes for this: direct control, which converts simple commands like "dodge", "move" and "roll" to keypresses, based on the control mapping of the application, and LLM control, which takes in more broad stroke instructions and tries to generate controls for them using the LLM before generating commands to execute.
Inspiration
We were inspired by the number of people with disabilities or mobility issues who enjoy gaming but struggle to do so. We tested this on games, but this could easily be extended to other software as well.
Challenges
Lots! We struggled to get the prompts to the LLMs output properly, as well as dealing with the limitations of the LLMs. Vision LLMs are capable of understanding images, but not as well as humans. We also had to deal with token limitations which prevented us from sending images in a continuous stream.
Log in or sign up for Devpost to join the conversation.