If Judge, notebooks in Databricks Workspace are found in Workspace -> Shared -> netflix-llm-rag

Inspiration

We often fight about what to watch, to the point where we don't watch anything at all. As more and more movies come out each year, it becomes harder to choose what movie or show to watch, especially when you're short on time and cannot watch everything. Netflix doesn't have a built in rating system, and even if it did, you would spend more time reading all the reviews instead of actually watching the movie. We hope NetChat can help with finding the right movie/show for you.

We also originally had ideas to also reccomend for all movies / tvs shows, or even YouTube channels and videos, but we decided to stick to a smaller media set (for reasons we'll get into later).

What it does

NetChat is basically a website where you can type in your questions about what you would like to watch. You can ask if there's media directed by a specific person, if the cast contains that person, what the movie is about, if you'll like it based on the genres you like, etc. NetChat will then return a response and reccommendations if it can.

How we built it

The front end of the website is built in React. It then sends a request to a Python backend (Flask), which then uses mlflow to retrieve the chatbot's response from Databricks.

Data

We found a dataset containing all the basic information of Netflix shows up until 2021. IMDB also has a dataset of all movies and shows with a little more information and with a bit of data wrangling, only selected the Netflix shows and tv shows. We also wanted more information our chatbot can use, so we also webscraped reviews of all the movies and shows from IMDB's website using Beautiful Soup. We also scraped data from tvtropes using selenium, which has more specific information that you may not find in reviews or brief descriptions. After cleaning up the data a bit, we imported it into the catalog.

Afterwards, we combined most of the data and put it in a vector search index for our chatbot to use.

Model

The model is an LLM RAG model. By providing some basic context and the vector search index, it's able to search the information it needs and return the answer to any Netflix movie/show reccomendations you ask.

Application

The React frontend was also built using a basic ui frameworklink. Using states we're able to update the information displayed on the page easily. Flask was used for its simplicity in setting up and language.

Challenges we ran into

The first challenge was of course cost and time. Unfortunately we couldn't include all the data we could. Despite choosing just one streaming service, Netflix still has over 8800 movies and tv shows. Web scraping takes lots of computation and time. As a result, much of the data wrangling and web scraping was actually done outside of Databricks (in vscode and rstudio). The files for those are provided in the github repository.

The second was hallucination. The chatbot would make up names of the movies. In addtion, I would ask if it knew a show "insert show name," and it would not recognize the show. However, when I ask for a reccomendation, it would recommend said "insert show name." It seemed to know the show but not know it at the same time! After a bit of tweaking, we resolved the issue.

The third challenge was making sure Netchat remembered past questions and conversations in the session. This mean storing a history and making sure it was part of consideration for the chatbot when developing an answer.

Overall, minor bugs and challenges were quickly solved by the ai assisstant provided on the Databricks platform.

Accomplishments that we're proud of

We're proud of creating a working chatbot that is able to recommend shows.

What we learned

We learned that AI can do anything, but since it can do anything, it can do anything you don't want it to do while also doing anything you want. There's a lot of considerations and limitations you must put to guide the chatbot. This is not only reflected in creating a strong context and including history, but also strong data preparation and indexes. We learned primarily through the provided llm-rag-chatbot documentation, and reading through the tutorial helped us learn what exactly was going behind the chatbot.

What's next for NetChat

Who knows... It would be cool if all streaming services had this!

Share this project:

Updates