🧠 IMDb Movie Reviews NLP Sentiment Analysis
An End-to-End Sentiment Analysis System Using IMDb Data, Transformers, and Explainability Tools for Strategic Streaming Decisions.
📌 Inspiration
- Inspiration for this project stemmed from the ever-growing need to understand and leverage customer feedback at scale, especially in the entertainment and streaming industry.
- With platforms like Netflix, Amazon Prime, and others handling millions of reviews daily, gaining insights from these reviews can provide a significant business edge.
- Sentiment analysis offers a powerful tool for companies to detect customer satisfaction, enabling data-driven decisions regarding content recommendations, retention strategies, and marketing campaigns.
🛠️ What it does
- This project performs sentiment analysis on IMDb movie reviews using a robust pipeline combining classical Machine Learning (ML) models, deep learning (LSTM and BERT fine-tuning), and explainability techniques (SHAP, LIME).
- The system classifies reviews into two categories: Positive and Negative sentiment. It aims to provide a real-world solution to classify and interpret customer feedback, which can be directly applied by entertainment platforms to improve customer retention and business performance.
🧱 How I built it
1.Dataset:
- The dataset, IMDb 50K Reviews, was obtained from Kaggle, with 50,000 balanced reviews (25K positive, 25K negative).
2.Exploratory Data Analysis (EDA):
-Performed basic statistics to understand review distribution, average review length, sentiment balance, etc.
3.Text Preprocessing:
- Cleaned the text by removing stop words, special characters, and applying lemmatization.
4.Modeling:
- Classical ML: Implemented Logistic Regression and Random Forest using TF-IDF vectorization for baseline sentiment classification.
- Deep Learning: Developed an LSTM model and fine-tuned a BERT model for advanced sentiment analysis.
5.Explainability:
- Used SHAP to visualize influential words contributing to positive and negative sentiment.
- Used LIME to generate local model explanations for individual reviews.
6.Deployment:
- Deployed the final BERT model as a live app on Hugging Face Spaces, providing interactive sentiment predictions.
🧗♀️ Challenges I ran into
1.Overfitting in LSTM:
- The LSTM model initially suffered from high overfitting, which led to suboptimal performance. This was addressed by tuning hyperparameters and utilizing techniques such as early stopping.
2.Fine-tuning BERT:
- Fine-tuning BERT for sentiment analysis was resource-intensive and required several adjustments to balance performance and runtime efficiency.
3.Interpretability:
- Explaining the decisions made by complex models like BERT was a challenge, but the integration of SHAP and LIME successfully provided valuable insights into the models' behavior.
4.Deployment:
- Ensuring the deployment of the final model on Hugging Face Spaces was tricky, especially in integrating all components (model, preprocessing, and explainability features) into a smooth, user-friendly interface.
🏆 Accomplishments that I'm proud of
- Successfully implemented and fine-tuned a BERT model that achieved 93.1% accuracy on IMDb reviews, the highest in the project.
- Integrated model explainability tools (SHAP and LIME) to offer transparency in the decision-making process, a crucial aspect for business adoption.
- Deployed a fully functioning sentiment analysis app on Hugging Face Spaces, making it publicly accessible and demonstrating real-world applicability.
- Highlighted the potential business impact, such as improving customer retention by detecting dissatisfaction early.
📚 What I learned
- Fine-tuning advanced models like BERT requires careful balancing of resources and computational power to avoid overfitting and ensure efficiency.
- Explainability is key to business adoption of machine learning models. The addition of SHAP and LIME not only improved model transparency but also strengthened the project's business case.
- The practical challenges of deploying deep learning models in a live environment, including ensuring performance and maintaining scalability, were valuable learning experiences.
🚀 What's next for Amazon - Explainable Product Image Classifier with LIME
- Moving forward, I plan to expand this project by incorporating more sophisticated natural language understanding techniques, such as sentiment intensity analysis, to gauge not just positive or negative sentiment but also the degree of satisfaction.
- Additionally, I would like to extend the model to classify reviews into finer-grained categories, such as helpfulness and relevance, to further enhance content recommendation systems on platforms like Netflix.
👩💼 About the Author
Sweety Seelam | Business Analyst | Aspiring Data Scientist | Passionate about building end-to-end ML solutions for real-world problems
Email: sweetyseelam2@gmail.com
LinkedIn
GitHub
Medium
My Portfolio
🔐 Proprietary & All Rights Reserved
© 2025 Sweety Seelam. All rights reserved.
This project, including its source code, trained models, datasets (where applicable), visuals, and dashboard assets, is protected under copyright and made available for educational and demonstrative purposes only.
Unauthorized commercial use, redistribution, or duplication of any part of this project is strictly prohibited.
Built With
- advanced-transformer-model(bert)
- deeplearning
- explainabilityai-shap
- hugging-face-deployed-app
- imdb
- machine-learning
- natural-language-processing
Log in or sign up for Devpost to join the conversation.