Inspiration
Coming into this weekend we had a few vague ideas of what we wanted to include: -machine learning -neural networks -text mining -lots of laughs However, the full idea for our project did not come together until Saturday morning. Our team member, Jackson's research project (sentiment analysis with R) was the primary inspiration for this project. He suggested tweet analysis. This was when we stumbled upon the @kidswritejokes twitter, a place dedicated to nonsensical, failed, but undeniably funny jokes written by children. In the end, we decided that we wanted to design a website which could generate kids jokes, utilizing text mining and a neural network to accomplish this task.
What it does
Put simply - it generates jokes similar to the jokes seen on the @kidswritejokes twitter, trying to emulate the nonsensical humor of kids jokes that almost make sense - but just misses the mark. For our user interface, we created a website where you can select the "type" of kid joke you would like to generate and then press a Generate Joke button. Our website then generates and displays a joke for you to laugh at and enjoy.
Our website also includes access to some of the original jokes that were the inspiration for our project.
How we built it
The process began by analyzing the jokes in R. We did this by first stripping away the unimportant words (stop_words) and then looking at which words occurred most frequently. Additionally, we ran a harmonic mean function on the data to determine the ideal number of topics for our LDA (Latent Dirichlet Allocation). Through the LDA, we were able to see the relation between the "types" of jokes and the words most commonly used in such jokes. After examining the different groupings, we determined several main categories that we wanted to include as "types" for our generator. We used this information to then weight the data we used to train our neural network. For example, one of these groupings appeared to encompass words we considered to be related to more "traditional" jokes. We then weighted all the jokes that contained these words more heavily than those that did not when training the neural network.
We used the markovify module in python to generate the new kids jokes based on trends we'd found in the data analytics run in R. Markovify is a Natural Language Generator which analyzes serial dependence (trends in the order of different words). We trained it on kids jokes as well as "dad jokes"; we chose to include these jokes with a lesser weight in order to increase our training data because these jokes have a similar sentence structure to kid jokes. After training, we were able to achieve intelligible jokes which were true to the source material. We then took these jokes and embedded them in our HTML code which is hosted on the domain badkidsjokes.com.
Challenges we ran into
As with any programming project debugging presented a challenge.
Originally we were using textgenrnn instead of markovify. We were testing our python on Google collaboratory, which would have been faster. However, when we tried to download the already trained network onto our local machines, it did not work, prompting us to switch to markovify.
Even after switching to markovify, we had a difficult time finding a large enough data set to work with. The jokes that the program was coming up with were either unintelligible or unoriginal. To remedy this, we trained the neural network on over a thousand dad jokes taken from @dadsaysjokes on twitter. This increase in more intelligible data seemed to fix the issue and we ended up with jokes fairly close to our source material.
Accomplishments that we're proud of
Through our program, we were able to create content that very closely resembles our source content there by accomplishing our goal. We were also proud that we were able to design a website after coming into the weekend knowing no html.
What we learned
We learned a lot about data analytics and sentiment analysis with R. We all learned a lot about HTML and Javascript. We expanded our knowledge of python and learned about neural networks.
What's next for Bad Kids Joke Generator
We hope to extend our work from this project so that we can replicate tweets from other twitter accounts. For future endeavors we would simply need to run our R script on the tweets to help determine the more important tweets we would like to emulate and again use this information to help weight the data used to train the neural network.

Log in or sign up for Devpost to join the conversation.