Inspiration- With rise in popularity of streaming services , users often struggle to choose between platforms like Netflix and Prime Video. Inspired by this everyday dilemma and a passion for data storytelling I wanted to explain the 2025 landscape of streaming content.
What I learned 1) Real World datasets are messy and require thoughtful cleaning and standardisation 2) PandaSQL is an effective tool to bridge SQL-style queries with Python data frames 3)Visual Storytelling makes insights more compelling than raw tables 4) Different platforms have different content strategies - some prioritise quantity, others focus on niche, high-rated content
How I built it 1) Data Collection - I sourced datasets for Netflix Movies, Netflix TV Shows, and a combined dataset for Amazon Prime From updated 2025 sources 2) Data Cleaning: Standardized genre, duration and rating columns. Extracted main genre and cleaned missing values. 3) Exploratory Analysis - Used PandaSQL to write SQL-like queries and built visualization with Seaborn and Matplotlib 4)Visualization - Create unique visualization including pie charts , line graphs , heatmasps , radar charts etc
Challenges Faced 1) Merging Datasets - The Netflix and Prime datasets had different schemas , standardising column names and types was tricky 2)Genre Parsing: Genres were in different formats and required regex-based cleaning 3)Data Imbalance - Netflix had far ore data than prime , which skewed visuals until normalised 4)Missing Values- some entries were incomplete which required imputation or exclusion logic 5) Radar Chart Setup : Plotting radar charts from SQL-based data required pivoting and adjust the polar axis manually
Key Takeaways 1) Netflix has more content overall particularly in Movies and English Language titles 2) Prime Video performs competitively in average ratings for select genres 3) Both platforms cater to different strengths : Netflix = Volume + Global Mix, Prime = Quality in Specific Niches
Log in or sign up for Devpost to join the conversation.