Inspiration
We now live in what some call the “era of abundance”. For any given product, there are sometimes thousands of options to choose from. Think of the examples above: streaming videos, social networking, online shopping; the list goes on. Recommender systems help to personalize a platform and help the user find something they like. The more relevant products a user finds on the platform, the higher their engagement. This often results in increased revenue for the platform itself. Various sources say that as much as 35–40% of tech giants’ revenue comes from recommendations alone. So I decided to build a basic content based recommender system web app.
What it does
When enter search for a movie in this web application, it shows the details of the movie and it recommends few other movies and their details based on that search.
How I built it
At first I gathered data (Movies name, ratings, casts, crew) from IMDB Website which gets updated daily. Then I pre-process the data in Kaggle. Then I get all relevent tags such as top actors' name, directors, writers, genres etc. and merge them using their movies' imdb ids. Then based on that movie id I transform those relevant tags into a vector on the basis of the frequency using countvectorizer from Scikit-Learn. Then I find cosine similarity between those movies' ids in a matrix. I save that similarity matrix and final processed movie list in kaggle after pickling them. I scheduled the notebook daily so it can send updated data daily. Then I get that similarity matrix and final list from Kaggle api in the streamlit app and store it in cache. This cache has a ttl limit for a day, thus the data will get updated daily. So if a user gives any movie input , I find that particular array of that movie in the similarity matrix which I get using Kaggle api . From that array I get the top 20 similar movies. Then again I sort that 20 movies based one their popularity and select 10 of them. Then I get the details of the input movie and recommended movie using TMDB api .
Challenges we ran into
The primary challenge was to make the app updated daily. But cleaning data and finding cosine similarity of that huge database was a heavy task. So in the server where I was deploying the app it was getting crashed due to memory error (for free users memory is very less in most of the cloud service provider). So I decided to use Kaggle somehow. Although there was not any good documentation on how to use the api, I managed to get the kaggle kernel output by going through the classes of the api.
Accomplishments that we're proud of
The app actually gives very accurate recommendations. And on top of that I managed to make the website get updated on daily basis which I feel was the most challenging part.
What I learned
- How to make content-based recommendation system
- How to use Kaggle api to get output data from notebook
- web scrapping, using APIs
What's next for Movie Recommender Web App
I wanted to build a good frontend for the website using reactjs, but the time was limited so I decided to use streamlit. So aftter this hackathon I am going to make proper frontend for the website.
Built With
- imdb
- kaggle
- python
- scikit-learn
- streamlit
- tmdb
Log in or sign up for Devpost to join the conversation.