We were trying to come up with an idea for a Hacklytics project and when we posed the question of "What do you really enjoy?", one of the teammates said "anime". This response acted as a springboard to an "Anime Recommendation System", but then another teammate thought about expanding our reach to movies. Thus, Cinemalytics was born. With the ample time on our hands because of the pandemic, we've had more time to watch Netflix and catch up on our favorite shows. One of the issues we've experienced is movie recommendations across streaming platforms.

What it does

Cinemalytics inputs a movie and predicts similar movies based on genre, actors, movie descriptions, etc.

How we built it

We aggregated IMDb movie data from kaggle and created a massive dataframe that had actor names, movie generes, movie titles, and descriptions of the movies. We one-hot encoded the genre to convert categorical data to numerical data. We tokenized and vectorized the actors and description columns. We used the K-nearest neighbor and K-means clustering model to group similar movies.

Challenges we ran into

  • Understanding important features/preprocessing
  • Tokenization
  • Vectorization
  • Adding matrices to dataframe

Accomplishments that we're proud of

We're proud that we completely pivoted our hackathon idea from when we initially signed up. We're also proud of creating a functional model within the time-constraints.

What we learned

We learned about nlp techniques like tokenization and vectorization of strings. We learned to be agile in our development process.

What's next for Cinemalytics

If we had more time, we would like to turn Cinemalytics into a web app that outputs streaming platforms where you can watch your top movie recommendations.

Share this project: