๐ฌ CinePick - Hybrid Movie Recommendation System
A personalized movie recommender that leverages Collaborative Filtering, Matrix Factorization (SVD & NMF), and Content-Based Filtering using movie genres. Built with Python, powered by Surprise, scikit-learn, and deployed using Streamlit.
๐ Inspiration
The sheer volume of movies available today makes it overwhelming for users to decide what to watch next. Our inspiration came from wanting to combine the best of multiple recommendation approaches into one system โ CinePick โ a platform that not only understands user preferences but also adapts dynamically to different viewing patterns.
๐ก What it does
CinePick analyzes a user's past ratings, compares them with others, and recommends movies that align with their tastes.
It combines:
- Collaborative Filtering (User-Based & Item-Based)
- Matrix Factorization (SVD & NMF)
- Content-Based Filtering using movie genres and TF-IDF
- Hybrid Approach blending these models for optimal recommendations
It also provides:
- Predicted ratings for unseen movies
- Movie posters, genres, and key details
- Evaluation metrics like RMSE, MAE, and Precision@K
๐ ๏ธ How we built it
- Dataset โ MovieLens dataset with movie metadata and user ratings
- EDA โ Explored trends in ratings, genres, and user behavior
- Model Development
- Collaborative Filtering (User & Item-based) using Surprise
- Matrix Factorization (SVD, NMF)
- Content-Based Filtering with TF-IDF on genres
- Collaborative Filtering (User & Item-based) using Surprise
- Hybrid Model โ Combined collaborative and content-based outputs
- UI โ Interactive dashboard using Streamlit to browse recommendations
- Deployment โ Packaged and made ready for local/online use
โ Accomplishments that we're proud of
- Successfully fetched and mapped poster images for over 5,000+ movies
- Implemented resume functionality to avoid starting over on interruption
- Handled errors gracefully using fallback poster URLs
- Progress saved after every movie to ensure data integrity
โ ๏ธ Challenges we ran into
- API rate limits and occasional TMDB search mismatches
- Handling movie title variations and special characters
- Notebook freezing due to long network waits or failed API calls
- Maintaining consistent format while resuming progress
๐ What we learned
- How to integrate and query external APIs (TMDB) using Python
- Using pandas efficiently for data handling and incremental CSV updates
- Error handling and retry logic for robust automation
- Importance of saving intermediate progress in large data tasks
๐ฎ What's next for CinePick
- Add filters for release year, genre, and minimum rating
- Integrate user authentication for personalized sessions
- Use deep learning-based recommendation models
- Deploy as a web app accessible globally
- Include real-time user feedback to improve recommendations
๐ Dataset
MovieLens Dataset
Includes user ratings, movie titles, genres, and timestamps.
movies.csvโ movie_id, movie_title, movie_genres, poster_urlratings.csvโ user_id, movie_id, user_rating, timestamp
Dataset Source: MovieLens
๐ ๏ธ Tools & Technologies Used
Languages & Frameworks:
Python, Streamlit
Libraries & Models:
- ๐ Pandas, NumPy, Scikit-learn
- ๐ฏ Surprise (SVD, NMF, KNNBasic)
- ๐งพ TfidfVectorizer (for genre similarity)
- ๐ Seaborn, Matplotlib
Version Control:
Git, GitHub
Environment:
Jupyter Notebook, VS Code
๐ฆ Installation
Clone the repository
git clone https://github.com/yourusername/movie-recommender.git cd movie-recommender
Create and activate virtual environment (optional but recommended)
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
๐ Install required libraries
pip install -r requirements.txt
โถ๏ธ Running the App
streamlit run app.py Then, open the link provided by Streamlit in your browser.
๐งช Evaluation Metrics
Each model is evaluated using: RMSE (Root Mean Square Error) MAE (Mean Absolute Error) Precision@K (for top-N recommendations)
๐ Recommendation Logic
Collaborative Filtering: Based on user-user or item-item similarity from historical ratings. Matrix Factorization: Learns latent features using SVD/NMF. Content-Based: TF-IDF vectorization of genres and cosine similarity. Hybrid: SVD predictions + fallback to content similarity if needed.
๐ค Contributors
Indu M Gopika Gokulanadh Ardra Pradeepkumar S. Rajalakshmi
Built With
- csv
- git
- github
- matplotlib
- movielens-dataset
- nmf
- numpy
- os
- pandas
- python
- random
- scikit-learn
- seaborn
- streamlit
- surprise
- svd
- time
- tmdb-api
- tqdm
- visual-studio
- vscode
Log in or sign up for Devpost to join the conversation.