Sprite-ify

Example generated video animation featuring Kurisu Makise from Steins;Gate

Inspiration

Inspired by the many videos on youtube of people putting character game sprites over audio...

Here's a few:
https://www.youtube.com/watch?v=GUZwiRQCTy0
https://www.youtube.com/watch?v=xm8wcF1jGaw
https://www.youtube.com/watch?v=opHV1XR4MTE

...And many more animated voiceovers!

What it does

Sprite-ify takes in an audio clip and converts it into a full-fledged sprite animation.

How we built it

We used Google Cloud Platform to utilize their Natural Language API and Speech to Text API for text analysis with machine learning, along with Google Cloud Storage for computing and storing results. We used this to serialize audio chunks based off of sentence breaks and obtain sentiment scores. Using this, we selected sprites for each chunk with the corresponding sentiment (e.g. happy, sad, neutral, embarrassed) and used OpenCV to create a video based on the sprites and chunk durations. We programmatically combined the original audio with the newly created video and return the output as an animation!

What type of Hack is this (or what Tracks does it fit)?

It can be a little difficult to decide what tracks this hack and possibly fit, in particular the Game Development track. For the current demo, this hack can not be used for the development of games. However, we believe that the technology involved in this hack can be applied to aid in the development of Visual Novels. In particular, analyzing text for emotions / sentiments to automatically choose applicable sprites can be an immense timesave in dev time. Normally, this is done manually by developers, which can tally up to several thousand statements. This hack can save a large amount of time in that process. Thus, we would like to argue that the hack is applicable to the Game Development track.

Additionally, this hack is applicable to the "Best Use of Google Cloud." As stated above, we use the GCP Natural Language API, Google Cloud Storage, and Speech-To-Text in our hack to accomplish the goal of applying sprites at the correct timing in our pipeline. Of course, this hack is also applicable to the general tracks which every hack is viable for.

Challenges we ran into

Originally we tried to use silence to detect sentence breaks. However, a lot of videos with background noise made this hard to do. To counter this we used Google Cloud's voice recognition API to learn analyze when new sentences were started. This was our first time leveraging Google Cloud's Machine Learning APIs as well as working on a video creation tool.