Inspiration
From the challenge. There were 2 ways to do it, one with clustering and one with buckets and I thought that it'd be cool to try it to learn it cause it's real customer data I get to play with.
What it does
It finds anomalys in the detection.
How I built it
- Use num ID to filter similar sounding words, add to anomaly if exists only once
- Use similar to filter similar sounding words, add to anomaly if it is unique
- Date time of the columns to calculate the time since last purchase for each item
- Run Isolation Forest clustering on each group, and add it to anomaly if it is an outlier
Took 22883.20790910721 seconds(~6 hours)
Challenges I ran into
It was a really big dataset, so it was really hard to work with due to the small unique exceptions
Accomplishments that I'm proud of
It not crashing
What I learned
How to interact with real world datasets
What's next for Anomaly Detection
Make it better and faster, maybe add unsupervised learning

Log in or sign up for Devpost to join the conversation.