Inspiration

Stock markets are complicated dynamic systems that are very sensitive to a multitude of factors. As math and science enthusiasts, my team has always been deeply interested in the behaviour of dynamical systems and the prediction of the properties of the dynamic systems. Traditionally, only methods of statistics and time series analysis was applied to the modelling of the stock market. However, with the advent of AI and machine learning, prediction of stock markets have increased in accuracy and efficiency. Having investigated time series analysis and stochastics before, my team decided to investigate the use of neural networks to predict the stock returns of various companies. Seeing that this has been done to quite an extent, we decided to improve on it by adding automated feature extraction. Furthermore, we are also working to develop a novel reinforcement learning model that will hopefully learn to trade on its own.

Overview

Our model pipeline is as follows:

  1. Acquire the stock price data - this is the primary data for our model.
  2. Preprocess the data - denoise data with wavelet transform and make the train, test datasets for the stacked autoencoders.
  3. Train the stacked autoencoder - this will give us our feature extractor.
  4. Process the data - this will give us the features of our model, along with train, test datasets.
  5. Use the LSTM neural network to learn from the training data.
  6. Test the model with the testing set - this gives us a gauge of how good our model is.
  7. Make stock price predictions
  8. Use Reinforcement learning to come up with a trading agent.

Our main innovations include: using wavelet transform to denoise the stock data, using stacked autoencoders for automated feature extraction, using LSTM neural network with Tikhonov regularisation and drop-outs to train on features, the use of online learning to continually train our model with usage and using a novel reinforcement learning method to create a trading bot.

Following this, we have scraped sentiment data for news and twitter about stocks, and are using it to develop a reinforcement learning model.

This is visually represented in the following flowchart.

alt text

How we built it

We scraped over 25000 sets of stock price data from Yahoo Finance. We then used discrete wavelet transform, coupled with threshold to denoise the stock data. Subsequently, we trained the stacked autoencoder for over 1000 epochs and extracted the features from our stock price data. We used these features to train our LSTM model. Deliberate actions were taken to prevent overfitting of data, including using Tikhonov regularisation and adding drop-outs to make our model more robust. Hyperparameters were tuned by grid search, and the model was optimised for performance.

We scraped over 25000 sets of stock price data from Yahoo Finance. We then used discrete wavelet transform, coupled with threshold to denoise the stock data. Subsequently, we trained the stacked autoencoder for over 1000 epochs and extracted the features from our stock price data. We used these features to train our LSTM model. Deliberate actions were taken to prevent overfitting of data, including using Tikhonov regularisation and adding drop-outs to make our model more robust. Hyperparameters were tuned by grid search, and the model was optimised for performance.

Our Twitter data is pulled with Twitter API with the hashtag of the stock (#AAPL). The sentiment value is then calculated using TextBlob and a simple mean is calculated for the neural network.

By using Aylien's News API, we were able to scrape the web with more criteria and accuracy. They are:

  • Limited to the category of finance, business and economic news
  • Refers to the stock company (Apple inc) using the specific dbpedia link
  • Refers to stock company (Apple) related articles with specific keywords such as 'Google' and 'Sony'
  • Filtered the top 150 Alexa websites only

For our News articles, we decided to manually run it through a filter function we made which is designed to mimic how humans read articles. The sentiment score is not calculated with every word in the article as typically, people would only read an article for up to 15seconds. To counter that, we would place a higher weightage on the title as around 80% of the people who read the title do not read the body of the article. Also, there are various underlying trends in which people read the article such as the length of the words. After accounting for those factors, we would then calculate the sentiment value for the filtered article is then calculated with TextBlob and a exponentially weighted moving average is calculated for the neural network.

Finally, we used the twitter and news sentiment, coupled with our predicted stock prices to build an reinforcement learning model that gets better over iterations, achieving an annual return of 200%.

Challenges we ran into

The main issue was getting the News and Twitter API to work and get us sentiment values. In particular, some of the articles or tweets scraped by the API were irrelevant and sometimes even misleading. Hence, we had to place safeguards to make sure we isolate tweets and news articles that actually have an impact on the viewpoint of the reader.

Twitter API was also limited to 10 days while we could only get a few years of news using the news API, severely limiting the amount of data we could get.

Main Results

The main results are summarised in the following graphs:

alt text

Avg MSE: 0.01431

alt text

Avg MSE: 0.0945 alt text

Avg MSE: 2.11 alt text

Avg MSE: 9.49

For our reinforcement learning portfolio: alt text

We achieved a 200% yearly return using reinforcement learning on Microsoft stock.

What we learned

We have learned that AI is very powerful in learning highly nonlinear trends and its impact in quantitative trading models will not diminish anytime soon. Furthermore, we have also learnt the importance have quality data for our machine learning models as they improve the accuracy by magnitudes.

What's next for AlphaAI

We aim to make our reinforcement learning algorithm much more sophisticated by accounting various market factors such as dividend payouts, transaction fees and opening the agent up to the different types of orders. If possible, we would like to investigate the impact of market microstructure in setting up good trading bots.

Built With

Share this project:

Updates