Anomaly Detection with Logistic Regression

Our notebook 1_Logistic_Regression.ipynb is available in Google Colab.

Open In Colab

Summary

To classify the anomalies in the data, we split the data into chunks consisting of 10 time steps, i.e. 90 minutes. This is time window is much longer than thermal and mechanical processes. Hence, the data within this time frame should be sufficient to explain why and how the anomaly arised.

A chunk of data belongs to the anomaly class if an anomaly happens at its end. The challenge is that there only 9 anomalies. However, avoiding them is very beneficial. Therefore, we could risk some false positives if we can find them.

First, we did a principal component analysis on the data. The anomalies clearly cluster within two regions.

PCA of chunked data

Since the problem is obviously not linearly separable, we use the set of Radial Basis Functions to generate features for logistic regression.

Thereby, we are able to identify 5 out of 9 anomalies at a false positive rate of only 8.4%.

Built With

Share this project:

Updates