๐ŸŽฌ CinePick - Hybrid Movie Recommendation System

A personalized movie recommender that leverages Collaborative Filtering, Matrix Factorization (SVD & NMF), and Content-Based Filtering using movie genres. Built with Python, powered by Surprise, scikit-learn, and deployed using Streamlit.


๐Ÿ“Œ Inspiration

The sheer volume of movies available today makes it overwhelming for users to decide what to watch next. Our inspiration came from wanting to combine the best of multiple recommendation approaches into one system โ€” CinePick โ€” a platform that not only understands user preferences but also adapts dynamically to different viewing patterns.


๐Ÿ’ก What it does

CinePick analyzes a user's past ratings, compares them with others, and recommends movies that align with their tastes.
It combines:

  • Collaborative Filtering (User-Based & Item-Based)
  • Matrix Factorization (SVD & NMF)
  • Content-Based Filtering using movie genres and TF-IDF
  • Hybrid Approach blending these models for optimal recommendations

It also provides:

  • Predicted ratings for unseen movies
  • Movie posters, genres, and key details
  • Evaluation metrics like RMSE, MAE, and Precision@K

๐Ÿ› ๏ธ How we built it

  1. Dataset โ€” MovieLens dataset with movie metadata and user ratings
  2. EDA โ€” Explored trends in ratings, genres, and user behavior
  3. Model Development
    • Collaborative Filtering (User & Item-based) using Surprise
    • Matrix Factorization (SVD, NMF)
    • Content-Based Filtering with TF-IDF on genres
  4. Hybrid Model โ€” Combined collaborative and content-based outputs
  5. UI โ€” Interactive dashboard using Streamlit to browse recommendations
  6. Deployment โ€” Packaged and made ready for local/online use

โœ… Accomplishments that we're proud of

  • Successfully fetched and mapped poster images for over 5,000+ movies
  • Implemented resume functionality to avoid starting over on interruption
  • Handled errors gracefully using fallback poster URLs
  • Progress saved after every movie to ensure data integrity

โš ๏ธ Challenges we ran into

  • API rate limits and occasional TMDB search mismatches
  • Handling movie title variations and special characters
  • Notebook freezing due to long network waits or failed API calls
  • Maintaining consistent format while resuming progress

๐Ÿ“˜ What we learned

  • How to integrate and query external APIs (TMDB) using Python
  • Using pandas efficiently for data handling and incremental CSV updates
  • Error handling and retry logic for robust automation
  • Importance of saving intermediate progress in large data tasks

๐Ÿ”ฎ What's next for CinePick

  • Add filters for release year, genre, and minimum rating
  • Integrate user authentication for personalized sessions
  • Use deep learning-based recommendation models
  • Deploy as a web app accessible globally
  • Include real-time user feedback to improve recommendations

๐Ÿ“ Dataset

MovieLens Dataset
Includes user ratings, movie titles, genres, and timestamps.

  • movies.csv โ€” movie_id, movie_title, movie_genres, poster_url
  • ratings.csv โ€” user_id, movie_id, user_rating, timestamp

Dataset Source: MovieLens


๐Ÿ› ๏ธ Tools & Technologies Used

Languages & Frameworks:
Python, Streamlit

Libraries & Models:

  • ๐Ÿ“š Pandas, NumPy, Scikit-learn
  • ๐ŸŽฏ Surprise (SVD, NMF, KNNBasic)
  • ๐Ÿงพ TfidfVectorizer (for genre similarity)
  • ๐Ÿ“ˆ Seaborn, Matplotlib

Version Control:
Git, GitHub

Environment:
Jupyter Notebook, VS Code

๐Ÿ“ฆ Installation

Clone the repository

git clone https://github.com/yourusername/movie-recommender.git cd movie-recommender

Create and activate virtual environment (optional but recommended)

python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate

๐Ÿ“Œ Install required libraries

pip install -r requirements.txt

โ–ถ๏ธ Running the App

streamlit run app.py Then, open the link provided by Streamlit in your browser.

๐Ÿงช Evaluation Metrics

Each model is evaluated using: RMSE (Root Mean Square Error) MAE (Mean Absolute Error) Precision@K (for top-N recommendations)

๐Ÿ”„ Recommendation Logic

Collaborative Filtering: Based on user-user or item-item similarity from historical ratings. Matrix Factorization: Learns latent features using SVD/NMF. Content-Based: TF-IDF vectorization of genres and cosine similarity. Hybrid: SVD predictions + fallback to content similarity if needed.

๐Ÿค Contributors

Indu M Gopika Gokulanadh Ardra Pradeepkumar S. Rajalakshmi

Built With

Share this project:

Updates