Anomaly Detection

Inspiration

From the challenge. There were 2 ways to do it, one with clustering and one with buckets and I thought that it'd be cool to try it to learn it cause it's real customer data I get to play with.

What it does

It finds anomalys in the detection.

How I built it

Use num ID to filter similar sounding words, add to anomaly if exists only once
Use similar to filter similar sounding words, add to anomaly if it is unique
Date time of the columns to calculate the time since last purchase for each item
Run Isolation Forest clustering on each group, and add it to anomaly if it is an outlier

Took 22883.20790910721 seconds(~6 hours)

Challenges I ran into

It was a really big dataset, so it was really hard to work with due to the small unique exceptions

Accomplishments that I'm proud of

It not crashing

What I learned

How to interact with real world datasets

What's next for Anomaly Detection

Make it better and faster, maybe add unsupervised learning

Built With

Updates

Kevin Fang started this project — Feb 03, 2019 08:38 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.