Inspiration

In a world that is increasingly becoming more automated, greater amounts of speech will be generated by AI. We propose a method of incorporating personality into a speech generation algorithm to make the generated speech more human-like. We set out to develop a system that is able to create speech that is specific to a topic, yet also captures an individual's personality.

What it does

First, you choose your celebrity or famous person. We have a variety of personalities present on our website--anywhere from Kanye West and Cardi B to Shakespeare and Nietzsche. Then, you can have the celebrity talk about whatever you type as input. We will output the generated sentences by the celebrity about the topic you typed.

How we built it

We created a large corpus of text for each of the celebrities featured on our website. The corpus can consist of a combination of written works by the person, tweets, or song lyrics. After gathering the text, we trained a character-level recurrent neural network on the corpus of each celebrity. When the user selects the celebrity, our website loads the model that was trained on that celebrity's writings and uses it to generate speech. To begin generating the speech, we start with the user's input and also intersperse words that are similar throughout speech. To find similar words, we used GloVe, a deep learning model that creates embeddings to represent words numerically. After generating the celebrity's text, we used Google Cloud's Text-to-Speech API to synthesize speech for that text. Unfortunately, the amount of voices are limited to very neutral-sounding tones (no Kanye drawl or Cardi B interjections), but hopefully there will be more voices available in the future!

Challenges we ran into

A major challenge was deploying our RNN model onto a Flask webserver, as the model ran into synchronicity issues when trying to make predictions. In order to solve this, we ended up completely scrapping the Tensorflow backend in favor of Theano, a different machine learning framework. Since none of us were very familiar with Theano, it took a few hours of growing pains to make the change happen.

Accomplishments that we're proud of

We're happy that we successfully created a model that could create novel English sentences and phrases. We are also happy with our website's striking yet simple UI!

What we learned

We learned how to deploy a deep learning model on a website. We also learned how to scrape and parse text data to be used to train our model.

What's next for What Would Kanye Say

We would love to see more focus on incorporating human personality into speech generation--for example, training synthesized voices with more "character." We believe that the future of conversational AI lies in creating something with a truly unique personality--something that is relatable and not just a robot.

Built With

Share this project:

Updates