GreatMindsTweetAlike

Polyhack2019

Inspiration

The inspiration for the web app was learning to use natural language processing to distinguish between different public figures' communication styles and potentially reach conclusions about the process of communication from different leaders.

What it does

The user inputs an average tweet from their account; it is assumed that the user will input a tweet-length sample of writing, with hashtags or links if desired. The language sample is tested against a neural network pretrained on the 500 most recent tweets each from Donald Trump and Barack Obama. The output of the program is a politician who matches the user's language sample most closely along with the degree of confidence to which the predicted politician is correct. Of course, it is likely that percentages will be low, as the neural network expects only tweets from Obama or Trump; however, the language sample can come from the user's own tweets or the tweets of those in a famous individual.

How we built it

Our main technologies used were HTML, CSS, JavaScript, and monkeylearn, an API for building a neural network. The neural network was initialized first to distinguish between 10 famous celebrities, athletes, and politicians. Due to the limitations of the neural network software, this amount of classes made it impossible to correctly distinguish each celebrity from one another. Then, the front end was created with HTML and CSS. Finally, the two parts of the framework were connected using JavaScript.

Challenges we ran into

One of the biggest issues we faced as a team was issues with the Twitter API. Originally, the app was meant to take in a user's Twitter username and judge the user's tweets based on their latest or most popular tweet. In addition, the bottom of the screen was meant to have a comparison in real time of the user's Twitter feed and the Twitter feed of the celebrity with which they most closely matched. However, it was discovered very late that Twitter requires a separate server to use the API, and the API provided significant difficulty to work. In fact, it was recommended that JavaScript not be used to work with the Twitter API at all; an issue we faced after our code was built and the four of us had gained some familiarity with JavaScript. Monkeylearn was the neural network API used to build a classifier for the neural network. Although the original plan was to build a classifier to classify 10 different celebrities, the limitations of the free plan on Monkeylearn left us with only 1000 possible training examples, and we were unwilling to pay $300 dollars for 2000 more training examples. 100 training examples per individual is completely insufficient for a neural network, and thus the project was cut to two celebrities. In addition, it was found that many celebrities tweeted videos or images and tagged others in the images. Future projects should look at these tags to build relationship models, and look at these images to determine personal likes. Because 3 members of our team were freshmen and one was a senior in high school, we ran into many issues in what we lacked in coding experience. Most of us were just beginning to learn JavaScript and HTML, and this steep learning curve presented itself in many ways throughout the 24 hours. Nevertheless, we were proud of work we completed and the amount that we learned.

Accomplishments that we're proud of

We were proud of learning to use JavaScript and HTML, and building a website that works as long as the user implements a good faith effort. We were also proud of, for 3 of us, completing our first hackathon, and working together on this goal.

Future Projects

It would be interesting to build a neural network by hand to avoid the limit on pretraining examples. Unfortunately, the project was chosen at Polyhack, and despite a few initial attempts to research neural language processing, it was ultimately decided to be impractical to build our own network from scratch in such a short period of time. This would allow more examples to be testing and thus increase accuracy in both the two candidates chosen and add additional celebrities. It would also be interesting to look at the choice of people tagged in posts and to analyze whether relationship circles could accurately predict a celebrity tweet. Finally, it would be interesting to analyze the ratio of pictures tweeted to the ratio of words as it compared to celebrity communication styles. Perhaps the data could be analyzed to determine a difference between political parties or occupations.

Built With

Share this project:

Updates