As an aspiring quant, I always wanted to experiment with stock market data and use AI to analyse it as exposure to the industry. Recently, I got my hands on some order book data that I have been wanting to play around with for a while. Hence, I used this hackathon as a platform for me to use AI to predict some aspect of the order book and trade using it.


OrderBookAI uses machine learning methods to predict the midprice movements of the order books and uses that prediction to build a trading strategy. The pipeline works as follows:

  1. Obtain order book data (in our case the data is preloaded)
  2. Feature Engineering and Preprocessing
  3. Train Random Forest model
  4. Test model predictions
  5. Use the model predictions to develop a trading strategy

The following graph summarises the order book data.

alt text

How I built it

Analysing order book data is often very hard due to the frequent jumps in the data. So, in order to extract more information from the order books, I engineered features that have predictive value. The following features were engineering:

  • Bid-Ask Spread
  • Midprice
  • Volume Imbalance
  • Smart price
  • Difference between current ask price and previous ask price
  • Difference between current bid price and previous bid price
  • The log return of the price of instrument
  • The 'rate/volume' of previous trade
  • Realised spread
  • Midprice momentum

Subsequently, I used the Random Forest classifier to predict if the midprice is going to increase, decrease or stay constant over the next time steps. This model was fit with over 3 million data points, and gave an overall test set accuracy of 97.0% in classifying the direction of the midprice movement. In particular, P(midprice moves up correctly) = 0.705, P(midprice moves down correctly) = 0.762 and P(midprice stays constant correctly) = 0.973.

Using this prediction, I developed a trading strategy that looks at past values of the data and the prediction to buy or short the stock.

As an additional challenge, I attempted to build a market making strategy that uses reinforcement learning (TD learning). However, the strategy struggled to find a mapping from the feature space to the policy space. Hence, the results were limited. I hope to improve on this after the hackathon.

Challenges I ran into

In terms of challenges, the biggest challenge was choosing how to use the prediction in the trading strategy. Simple strategies such as buying and selling according to the midprices only did not work due to crossing the bid-ask spread too frequently. However, when used in conjunction with other features, the results were much better.

Another challenge was to find features that worked. The engineered features were researched in cutting edge research papers on the subject, but due to the secretive nature of the industry, the information was quite limited.


The following summarises the results of the strategy:

alt text

Limitations and Outlook

Although the performance of the strategy looks promising, the main issue that has not been addressed is the modelling of market impact and transaction costs, which play a critical role is the PnL of any strategy, especially in the high frequency realm. Furthermore, the latency slippage was not considered and that would affect the PnL significantly.

Subsequently, I am aiming to model the market impact and slippage and incorporate it into the strategy. Furthermore, I would like work on further on the reinforcement learning to perform market making.

Built With

Share this project: