A lot of our project was focused on exploratory data analysis. Our submission to the benchmark test is in our Github repo, named benchmark_qrt.csv . Our Github repo also contains Jupyter notebook files of our exploratory analysis.

We tried using a transformer model, as they apply well to time-based data. However, the time to process the dataset was too great, and we couldn't finish its computation in time.

We also experimented with dynamic time warping (DTW), as we realised that individual stocks lag behind the sector average by different amounts. We believed that by adjusting for this lag, or using the DTW as a feature, we could better our model. An image is attached showing how we used a dynamic time warping algorithm to map a stock onto the market average. We used the approach found at https://tech.gorilla.co/how-can-we-quantify-similarity-between-time-series-ed1d0b633ca0 .

Another part of our experimentation was looking into whether a latent factor model could extract the features of sectors. For this, we used KNN with 11 latent factors. There was some limited success although there wasn't time to tune the model for higher accuracy.

We experimented with different Machine Learning techniques including ensemble methods using a variety of different algorithms such as SVM, Logistic Regression and normal regression. In our benchmark submission, we used 10 shifts and engineered features with mean/std/var statistics conditional on sector, date and industry. The final model that served us best results was the Gradient Boosting Algorithm! Our submission is included in our repo.

Jupyter notebook files of these approaches are in the Github repo.

Built With

Share this project: