Inspiration
Our inspiration was to uplift the society by helping kids who are physically challenged. Using an Audio to Text API helps eliminate the physical actions to deliver an instruction and making it more user friendly.
What it does
It is a controller for the snake in the classic snake game of the 90s. It allows the user to command the snake to move UP, LEFT, DOWN, RIGHT through their voice. It can be used to help specially abled kids exercise their mind and refresh themselves without getting affected by their physical disadvantages. It could also be used as a neurocognitive test for degenerative mental diseases to identify patients.
How we built it
We created the game using pygame library of Python 3.x. We used the sounddevice library of python to obtain audio from the microphone and save the file as a .wav file. After that we used the Deepgram to decipher the audio from the .wav file and identify the commands (UP, DOWN, LEFT, RIGHT) via the prerecorded speech to text API that returned the transcript in a json string from which we used the commands to move the snake in the desired direction.
Challenges we ran into
Many libraries had to be installed via pip before we could actively use the API for the speech recognition task. It was our first time working with asynchronous functions in Python, so that caused an issue. Also we weren't able to create a real time transcript generation because that involved concepts of threading and parallel computing that we are not very familiar with.
Accomplishments that we're proud of
Not being able to make a real-time transcript generator we came up with another idea of collecting multiple instructions in a list and then popping the first instruction in each pass of the main loop. We did not give up until we saw our program finally yielding results even given the online mode of the hackathon.
What we learned
We had to learn how to use the API and now we can comfortably build more complex projects using the same. We learnt about asynchronous functions in python and how to implement the "infamous snake game" using pygame.
What's next for Audio Position Controller
(We couldn't implement these given the finite time constraint of the hackathon, but we'd love to)
- We can build an emulator where more games with common keys can run via audio instructions.
- Currently the only way to terminate the game is by crashing on to the wall. An endgame button GUI implementation can be further worked on.
- Since the recording and snake motion are sequential, they need to happen one at a time. This causes lag. If parallel processing is used to run both the processes concurrently the lag can be significantly reduced.
- The
Speak Now:andDon't Speak Now:alerts appear in the terminal output. Addition of a GUI (Maybe Mute/Unmute icons) to make these visible on the main screen would allow the user to when to speak and when not to.
Log in or sign up for Devpost to join the conversation.