In today's world finance is becoming increasingly about technology and data science. In that same vein, this project aims to be a proof of concept for the analyzing of social media to get the public opinion of a Stock.

This idea stems from the efficient market hypothesis.

The efficient market hypothesis (EMH) is an investment theory that states it is impossible to "beat the market" because stock market efficiency causes existing share prices to always incorporate and reflect all relevant information. According to the EMH, stocks always trade at their fair value on stock exchanges,


On interpretation of this idea is the concept of market inefficiencies. Because the value to the individual of a stock is not always the fair value due to human error. Human perception of a stock, and thus the price is what ultimately determines the price the stock trades at.

What inspired me to take on this project is the tweet by Kylie Jenner that crashed shapchat's stock

With a simple tweet, snapchat's stock fell dramatically. This suggests that social media can have a huge impact on the stock market. By analyzing these trends and monitoring public opinion of companies we can possibly build a predictive model to exploit market inefficiencies and anticipate changes in the market before they happen.

Because computers can pull data automatically from these sources and can analyze them in real-time, this could revolutionize the finance industry as a whole.

What it does

Pulls twitter and stock information Clusters tweets based on influence Builds both a unsupervised and supervised machine learning models with decentaccuracy on validation sets Output dataset and model saved

How I built it

Using several data science libraries in python, pandas carried me through all the csv formatting and such which would have been a huge issue There is a walk through on how exactly I did it at my project website hosted on github pages :

Challenges I ran into

Twitter APIs are not as friendly as I would like The deep neural net took some messing around to get to have strong outputs

Accomplishments that I'm proud of

73.44% accuracy on validation set

What I learned

Pulling twitter data tensorflow and scikit-kearn pelican.

What's next for Twitter sentiment analysis for stock prediction

For future expansions of this project, I would like to vastly increase the size of the dataset used, experiment with other dimensions such as graph theory based evaluation of the network, explore using more than one social media source, and just play with this concept on a larger scale.

Built With

Share this project: