Inspiration
I was inspired to create WhatTheFilm! after constantly seeing clips from movies and TV shows in reels but never being able to figure out their titles! It’s a common frustration for many, so I decided to build a solution that could identify movies and shows from images instantly.
What it does
The result was WhatTheFilm!, a fully functional web application capable of detecting movie scenes, identifying the associated movie or show, and providing additional context such as cast and streaming platforms. Despite the challenges I faced, the project was a success and gave me invaluable experience in Custom MLops, Custom ML workflows, object detection, and application deployment through streamlit.
How we built it
Custom Object Detection Model: I built a custom object detection model from scratch using YOLOv8 and COCO (You Only Look Once) architecture. I manually downloaded and labeled a dataset of 1,300 images (Yes, you read that right) from various movie and TV show scenes. Labeling the data was a meticulous task that involved identifying key objects and features to help the model accurately detect and classify scenes.
First Exposure to MLOps: This project marked my first exposure to MLOps, learning how machine learning workflows and pipelines operate. It was a challenging yet rewarding experience as I encountered numerous new tools such as Roboflow for dataset management, YOLO for model training, and Streamlit for deploying a user-friendly interface. Managing the end-to-end lifecycle, from data preparation and training to deployment, gave me valuable insight into MLOps practices.
Open-Source LLM Integration for Natural Language Processing: After running object detection, I used an open-source large language model (OpenAI) to transform the model’s JSON output into human-readable information. This involved querying details such as the name of the movie or show, key actors, and the streaming platforms where the content could be found. The LLM parsed and interpreted the results, making the app’s output accessible to users.
Front-End Deployment with Streamlit: To make the application interactive, I implemented Streamlit as the front-end framework, allowing users to upload images or click photos through their webcam. The interface was simple yet effective, designed for ease of use. I also integrated PropelAuth for handling user authentication.
Challenges we ran into
Training with Limited Data: Building a high-performing model with just 1,300 labeled images was difficult. I had to make sure the dataset was diverse enough to generalize across different movies and scenes while fine-tuning the model to improve accuracy.
MLOps and Pipeline Management: As it was my first time working with MLOps, managing the entire machine learning pipeline from scratch was quite overwhelming. Handling the various stages (data processing, model training, model deployment, etc.) and ensuring each step worked cohesively was a steep learning curve.
Solo Effort: I worked on this project entirely by myself, which made the experience more challenging. With no teammates to collaborate with, I had to rely on online forums, tutorials, and hands-on videos to acquire the knowledge needed. I persevered through the difficulties, applying what I learned immediately to the project.
Accomplishments that we're proud of
Custom Trained Object Detection Model: We’re incredibly proud of building a custom object detection model from scratch, trained on 1,300 carefully labeled images. This was a major milestone because it ensured the core functionality of the app—identifying movie and TV show scenes—was accurate and efficient. The model's ability to recognize key elements from different scenes with a relatively small dataset was a breakthrough in our approach to image classification.
End-to-End MLOps Pipeline: Successfully integrating a machine learning pipeline from data collection, training, and validation, to deploying a functional application on the web, was a massive achievement. Since this was our first exposure to MLOps, we were proud of the smooth collaboration between various technologies like YOLO for object detection, Roboflow for dataset management, and OpenAI for natural language processing.
Real-Time Image Recognition and Results Compilation: The seamless experience for users—from uploading a photo to getting the name of a movie, cast information, and streaming platforms within seconds—is something we’re really proud of. Integrating the OpenAI language model to interpret the JSON output and return meaningful results to the user transformed raw data into an intuitive user experience.
Solo Effort with Persistent Learning: Accomplishing all this as a solo developer was no small feat. I navigated a steep learning curve, often venturing into unfamiliar territories of machine learning, API integration, and web app deployment. Relying on online resources and communities, I overcame numerous challenges to deliver a fully functioning product.
What we learned
Working on this project gave me firsthand experience with the machine learning pipeline—starting from data preparation to model deployment. We became familiar with new tools like Roboflow for managing datasets and monitoring model performance. Learning how to orchestrate the various stages of MLOps and manage dependencies between components was a critical part of the learning process.
What's next for WhatTheFilm!
Expand the Training Dataset Collect and label a larger dataset with more diverse movie and TV show scenes.
Use data augmentation techniques to improve model generalization.
- Improve Model Accuracy Fine-tune the YOLOv8 model with transfer learning from larger object detection datasets.
Experiment with multimodal approaches, combining object detection with scene recognition.
- Enhance the User Experience Add a reverse search feature (allow users to search via keywords or descriptions).
Improve UI/UX for a smoother, more intuitive experience.
- Optimize Performance & Deployment Deploy on a scalable backend with GPU acceleration for faster inference.
Explore model quantization or distillation for efficiency.
- Expand Features Introduce video clip support (users can upload short clips instead of just images).
Implement community tagging where users help refine predictions.
- Monetization & Growth Explore partnerships with streaming platforms for direct watch links.
Create a freemium model with premium features like advanced searches or offline access.
Built With
- coco
- cocon
- css
- docker
- html
- openai
- opencv-python
- propelauth
- python
- streamlit
- websockets
- yolov8


Log in or sign up for Devpost to join the conversation.