Inspiration

Outlier detection is an important problem in multiple domains in Computer Science, including machine learning. DBSCAN is a density based clustering algorithm that can help detect outlier data observations, but requires the calculation of two parameters, minpts and epsilon. When we are not sure about the data we have, inferring these parameters is challenging.

What it does

For this project, we will utilize a Genetic Algorithm approach to find the optimal combination of the two parameters, given our data.

How I built it

Everything is built in Rstudio. The Genetic Algorithm is instantiated in GA package, and the clustering was supported by the dbscan package.

Challenges I ran into

Doing this challenge, I was happy to string all the necessary components together. However, there were aspects of the libraries I used I did not like, and I think I should have created my own implementations for the optimization part.

Accomplishments that I'm proud of

Just deploying software on the web for the scientific community to immediately use! link

What I learned

I was able to delve more into the field of optimization, and it's very interesting for me. I think I'd like to make a career out of this :)

What's next for ANOMA

It's an open source tool, so I hope to improve it as people are using it and suggesting ways to make it better.

Built With

Share this project:

Updates