From the challenge. There were 2 ways to do it, one with clustering and one with buckets and I thought that it'd be cool to try it to learn it cause it's real customer data I get to play with.

What it does

It finds anomalys in the detection.

How I built it

  1. Use num ID to filter similar sounding words, add to anomaly if exists only once
  2. Use similar to filter similar sounding words, add to anomaly if it is unique
  3. Date time of the columns to calculate the time since last purchase for each item
  4. Run Isolation Forest clustering on each group, and add it to anomaly if it is an outlier

Took 22883.20790910721 seconds(~6 hours)

Challenges I ran into

It was a really big dataset, so it was really hard to work with due to the small unique exceptions

Accomplishments that I'm proud of

It not crashing

What I learned

How to interact with real world datasets

What's next for Anomaly Detection

Make it better and faster, maybe add unsupervised learning

Share this project: