Anime recommendation assistant

Inspiration

We are all very passionate about Anime culture and are always looking for new exciting anime that will fit our taste closely. Through this, we decided to create a machine learning software that could recommend someone a new anime that they would enjoy as thoroughly as their previously enjoyed anime.

What it does

The software graphs the data in a 3D space where the data is then grouped using k-means clustering. This groups similar data together and discovers underlying patterns. By having this, we know that a person who likes a certain anime in a cluster would also like another anime very close in distance to the first anime. It is a machine learning program that determines the number of clusters by comparing the inertia value to the silhouette value and finding a "sweet spot" where both can maximize the fit of the data.

How we built it

We built it using Python and a data set found on Kaggle which took user data from 76,000 users on My Anime List. We did not write the code from scratch, rather we went through a tutorial that walked through how to do clustering and tweaked some of the values and functionalities. The link to the tutorial is: https://www.kaggle.com/tanetboss/user-clustering-for-anime-recommendation

Challenges we ran into

Finding a dataset was difficult as it is a very specific set of data that we needed to acquire. After finding one, we realized that machine learning is much harder than we initially thought and is much less structured than conventional programming. 3 out of the 4 members have also never used Python to program before which was a big challenge to overcome. We needed to learn Python as well as learn to implement it in the program in such a short amount of time.

What we learned

We learned that machine learning is a very versatile tool that can adapt to multiple data sets. We also found that data can be imperfect which needs to be fixed by cleaning the data. We learned that having data is a very important part of a project because, without clean data, it is nearly impossible to produce an accurate result.