On the surface, it seems like whenever some headline comes about about a new product or service, markets react. Our original goal was to use Google's AutoML Natural Language classifier to create a model that could determine whether or not a certain index would go up or down based on the headlines from that day. However, API rate limits prevented us from getting enough headlines to create a reasonable dataset for our ML. Because of this, we decided to pivot to the next most interesting thing that could be toying with the market: Trump's tweets.

What it does

Our site allows users to "take control" of Donald J. Trump's Twitter account and use it to wreak havoc on the S&P 500. Type in your proposed tweet and observe how our ML model anticipates the index will be affected.

How we built it

First, we created a tool that would allow us to generate large quantities of training data for our model. This involved creating a classification system for the change in an index for a given day, and a Twitter scraper that downloads the President's tweets. Next, we paired each of our tweets with the classification from the day that the tweet was published, rebalanced our training data, and uploaded it to Google AutoML for training. Once the training is done, our React.js frontend and Node.js backend allow the user to use the model to predict changes and visualize the result.

Challenges we ran into

We initially struggled with API rate limits and approval wait times for getting enough data to train our models, but we were able to get around this by using a combination of APIs and web-scraping. Determining the best way to balance our training data was also difficult, we initially overrepresented certain classifications in the market, which resulted in our models predicting the same outcome, no matter the text. Setting caps on the number of training samples for each category helped us resolve this, however we have a lot to learn. Finally, all four of us are new to JavaScript, Node.js, React.js, and machine learning so we were challenged by the fact that we needed to teach ourselves and each other about the tools we were using as we were using them, all while making sure that we are making progress at an appropriate pace.

Accomplishments that we're proud of

  • Completing a halfway-decent looking frontend with React.js. We are all new to JavaScript and React is something that we all "wanted to learn one day", so we were happy that we were able to get something out of the door using it.
  • Two of us are totally new to programming and we are really happy that we were able gain some exposure to, and independence in Git, Python, and JavaScript, which will certainly help us with our projects in the future.

What we learned

Quality is just as important as quantity when collecting training data. We knew that the more data we could collect, the better our predictions would be, but we learned more about the quality that is required of that data and some strategies that we can use to clean out useless noise.

What's next for

The site currently only shows how a single tweet will affect a single change on the S&P 500. We believe the user experience will be more fun and impactful if changes made by each tweet can be persisted so that users can simulate themselves tweeting throughout the day. Our ML models can definitely be improved. Getting higher API limits and permissions will allow us to collect a more meaningful number of Trump tweets and it may also allow us to return to our original goal of using news headlines to make predictions.

+ 5 more
Share this project: