Prolonged covid restrictions have caused immense damage to the economy and local markets alike. Shifts in this economic landscape have led to many individuals seeking alternate sources of income to account for the losses imparted by lack of work or general opportunity. One major sector that has seen a boom, despite local market downturns, is investment in the stock market. While stock market trends at first glance, seem to be logical, and fluid, they're in fact the opposite. Beat earning expectation? New products on the market? It doesn't matter!, because at the end of the day, a stock's value is inflated by speculation and hype. Many see the allure of rapidly increasing ticker charts, booming social media trends, and hear talk of town saying how someone made millions in a matter of a day cough GameStop cough , but more often then not, individual investors lose money when market trends spiral. It is nearly impossible to time the market. Our team sees the challenges and wanted to create a platform which can account for social media trends which may be indicative of early market changes so that small time investors can make smart decisions ahead of the curve.

What it does

McTavish St. Bets is a platform that aims to help small time investors gain insight on when to buy, sell, or hold a particular stock on the DOW 30 index. The platform uses the recent history of stock data along with tweets in the same time period in order to estimate the future value of the stock. We assume there is a correlation between tweet sentiment towards a company, and it's future evaluation.

How we built it

The platform was build using a client-server architcture and is hosted on a remote computer made available to the team. The front-end was developed using react.js and bootstrap for quick and efficient styling, while the backend was written in python with flask. The dataset was constructed by the team using a mix of tweets and article headers. The public Twitter API was used to scrape tweets according to popularity and were ranked against one another using an engagement scoring function. Tweets were processed using a natural language processing module with BERT embeddings which was trained for sentiment analysis. Time series prediction was accomplished through the use of a neural stochastic differential equation which incorporated text information as well. In order to incorporate this text data, the latent representations were combined based on the aforementioned scoring function. This representation is then fed directly to the network for each timepoint in the series estimation in an attempt to guide model predictions.

Challenges we ran into

Obtaining data to train the neural SDE proved difficult. The free Twitter API only provides high engagement tweets for the last seven days. Obtaining older tweets requires an enterprise account costing thousands of dollars per month. Unfortunately, we didn’t feel that we had the data to train an end-to-end model to learn a single representation for each day’s tweets. Instead, we use a weighted average tweet representation, weighing each tweet by its importance computed as a function of its retweets and likes. This lack of data extends to the validation side too, with us only able to validate our model’s buy/sell/hold prediction on this Friday's stock price. Finally, without more historical data, we can only model the characteristics of the market this week, which has been fairly uncharacteristic of normal market conditions. Adding additional data for the trajectory modeling would have been invaluable.

Accomplishments that we're proud of

  • We used several API to put together a dataset, trained a model, and deployed it within a web application.
  • We put together several animations introduced in the latest CSS revision.
  • We commissioned McGill-themed banner in keeping with the /r/wallstreetbets culture. Credit to Jillian Cardinell for the help!
  • Some jank nlp

What we learned

Learned to use several new APIs, including Twitter and Web Scrapers.

What's next for McTavish St. Bets

Obtaining much more historical data by building up a dataset over several months (using Twitters 7-day API). We would have also liked to scale the framework to be reinforcement based which is data hungry.

Share this project: