Stock markets are complicated dynamic systems that are very sensitive to a multitude of factors. As math and science enthusiasts, my team has always been deeply interested in the behaviour of dynamical systems and the prediction of the properties of the dynamic systems. Traditionally, only methods of statistics and time series analysis was applied to the modelling of the stock market. However, with the advent of AI and machine learning, prediction of stock markets have increased in accuracy and efficiency. Having investigated time series analysis and stochastics before, my team decided to investigate the use of neural networks to predict the stock returns of various companies. Seeing that this has been done to quite and extent, we decided to improve on it by adding automated feature extraction. Furthermore, we are also working to develop a deep reinforcement learning model that will hopefully learn to trade on its own.
Our model pipeline is as follows:
- Acquire the stock price data - this is the primary data for our model.
- Preprocess the data - denoise data and make the train, test datasets for the stacked autoencoders.
- Train the stacked autoencoder - this will give us our feature extractor.
- Process the data - this will give us the features of our model, along with train, test datasets.
- Use the neural network to learn from the training data.
- Test the model with the testing set - this gives us a gauge of how good our model is.
- Make useful stock price predictions
- Use Reinforcement learning to come up with a trading agent. This is visually represented in the following flowchart.
Our main innovations include: using wavelet transform to denoise the stock data, using stacked autoencoders for automated feature extraction, using LSTM neural network with L2 regularisation and drop-outs to train on features.
Following this, we have scraped sentiment data for news and twitter about stocks, and are using it to develop a reinforcement learning model. These data are scraped using newsapi by Aylien and Twitter API. The scraped data is then fed through a model which simulates the amount of information people take in while browsing web articles.The modified data is then given a sentiment score based on its language nuances.
How we built it
For our sentiment values, we looked up the ways people perceive information on the internet before modifying our input articles to better reflect how these articles truly affect people's perception of the mentioned stocks - Most people tend to skim through web articles without reading 80% of it.