For the Citi x CMU Hackathon, data was provided from a single day of trading. Our mission revolved around "market spoofs", orders placed in order to illegally influence the market. Our first aim was to be able to identify and describe the different types of spoofing found in data. After identification and interpretation, we aimed to develop a technical solution that is accurate at detecting spoofs.
We utilized PyCaret, a database of ready-made machine learning classification algorithms, to help us build a spoof classifier. Our initial approach was to tune a model based upon the data provided to us, but upon further research, we realized we needed to add auxiliary data in order to represent the relationships between different orders.
To do this, we successfully wrote a program in Python to "treat" the data, appending information about the volume and mean order times to each data point. This significantly improved our precision, recall, and F1 and ensured our model would be successful if deployed in the real world.

Log in or sign up for Devpost to join the conversation.