Anomaly Detection with Logistic Regression

Our notebook 1_Logistic_Regression.ipynb is available in Google Colab.

Summary

To classify the anomalies in the data, we split the data into chunks consisting of 10 time steps, i.e. 90 minutes. This is time window is much longer than thermal and mechanical processes. Hence, the data within this time frame should be sufficient to explain why and how the anomaly arised.

A chunk of data belongs to the anomaly class if an anomaly happens at its end. The challenge is that there only 9 anomalies. However, avoiding them is very beneficial. Therefore, we could risk some false positives if we can find them.

First, we did a principal component analysis on the data. The anomalies clearly cluster within two regions.

PCA of chunked data