What Would Kanye Say

Inspiration

In a world that is increasingly becoming more automated, greater amounts of speech will be generated by AI. We propose a method of incorporating personality into a speech generation algorithm to make the generated speech more human-like. We set out to develop a system that is able to create speech that is specific to a topic, yet also captures an individual's personality.

What it does

First, you choose your celebrity or famous person. We have a variety of personalities present on our website--anywhere from Kanye West and Cardi B to Shakespeare and Nietzsche. Then, you can have the celebrity talk about whatever you type as input. We will output the generated sentences by the celebrity about the topic you typed.

How we built it

We created a large corpus of text for each of the celebrities featured on our website. The corpus can consist of a combination of written works by the person, tweets, or song lyrics. After gathering the text, we trained a character-level recurrent neural network on the corpus of each celebrity. When the user selects the celebrity, our website loads the model that was trained on that celebrity's writings and uses it to generate speech. To begin generating the speech, we start with the user's input and also intersperse words that are similar throughout speech. To find similar words, we used GloVe, a deep learning model that creates embeddings to represent words numerically. After generating the celebrity's text, we used Google Cloud's Text-to-Speech API to synthesize speech for that text. Unfortunately, the amount of voices are limited to very neutral-sounding tones (no Kanye drawl or Cardi B interjections), but hopefully there will be more voices available in the future!

Challenges we ran into

A major challenge was deploying our RNN model onto a Flask webserver, as the model ran into synchronicity issues when trying to make predictions. In order to solve this, we ended up completely scrapping the Tensorflow backend in favor of Theano, a different machine learning framework. Since none of us were very familiar with Theano, it took a few hours of growing pains to make the change happen.

Accomplishments that we're proud of

We're happy that we successfully created a model that could create novel English sentences and phrases. We are also happy with our website's striking yet simple UI!

What we learned

We learned how to deploy a deep learning model on a website. We also learned how to scrape and parse text data to be used to train our model.

What's next for What Would Kanye Say

We would love to see more focus on incorporating human personality into speech generation--for example, training synthesized voices with more "character." We believe that the future of conversational AI lies in creating something with a truly unique personality--something that is relatable and not just a robot.

Built With

flask
glove
keras
python
word2vec

Submitted to

LA Hacks 2019

Created by

Developed Long Short-Term Memory (LSTM) recurrent models to generate text. Scraped and pre-processed text to be standardized for model input.

Zane Durante
Undergraduate Machine Learning Researcher at USC
I developed the GloVe similar-word-finder functionality. I worked with Jeffrey on the backend server logic (including deploying the deep learning model on the server and implementing text-to-speech via the Google Cloud API). Finally, I designed the UI for the website.

Ben Ma
I worked on the backend for the web app and implemented some of the site's pages. Also worked on the routing and request handling as well.

Jeffrey Xie

Updates

Zane Durante started this project — Mar 30, 2019 11:01 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.