mse vs epochs
hyper parameters success v/s dropout
Latest model prediction v/s actual
mse vs neural network structure
mse vs data fed in (x-axis is misleading - please ignore those labels)
loss functions on training and predicted
loss function while training
mse vs decay
predicted values for previous model
I wanted to see to what extent does news articles affect stock market prices. Turns out, tweets are a slightly better indicator!
What it does
It uses the 'sentiment' of tweets and news articles about a company from 2007-2016 to predict stock market prices for the next day. It essentially checks for what the people are feeling and uses that to predict the values.
How I built it
Made a ML model, that built my final ML model. I use Naive Bayesian for my final predictor with deep learning-type-model build.
Written entirely in python, I use NumPy, and pandas for data manipulation. I use Tensorflow for efficient training and Keras to make it easier to write Tensorflow code.
It gets news articles and tweets from 2007, and analyses their sentiment. Based on the overall sentiment score, it tries to predict the stock market prices (either the closing and opening indices for S & P 500, or an individual stock).
Another interesting thing it does is, it after the first month of training data, it predicts values for a day, checks against actual values, if they are correct, does nothing, but if they are wrong, then runs it with increased weight.
It also 'decays' the weights periodically. Essentially, after every year's worth of training data, it decays the weights of all data from previous years and re-runs them. It makes the model take almost ~35 minutes longer to run, but increases accuracy by ~10%
Challenges I ran into
A lot of them. My first couple of models would not go above 50% no matter what I tried. My current model takes approximately 3 hours to train for 300 iterations over the neural net and optimization for the hyper parameters (on an 8th gen i5). for some months, I would just not have enough data to try to go with, for some I would have too much. My first data set (which I spent almost 2 hours to scrape) was 10 gigs big and took forever to train on. My new dataset is much more smaller and 'leaner', and overall is pretty fast to run on (compared to my previous one) My sentiment analyser was also pretty bad and had an accuracy of 50% (essentially a glorified coin toss by a ML algorithm)
Accomplishments that I'm proud of
My MSE for sentiment analysis is of the order -6. My predictions are also very close to the real value and lie in the 35% Confidence Interval (a VERY thin band). Writing a ML model that finds the best ML model to use to predict values. Check images for my model vs actual values!
What I learned
A whole bunch of ML and Deep Learning (and why they're not easy to implement everywhere). Why stock markets are so hard and annoying to predict. Twitter is a better form for figuring out if prices are going to decrease/increase. Doing a hackathon by yourself is hard. Humans can stay up for more than 2 days and still be fine.