We were interested in the concept of the Bayesian Spam Filter and decided to implement a similar approach to detect and remove movie spoilers instead of spam emails from different webpages.

What it does

The program takes in two separate inputs: a website's URL and movie title and scrapes the said website to obtain the required information from its source file. It then removes words or phrases based on a spoiler library made by analyzing movie-related webpages like IMDB or Wikipedia and updates the source file. The program then returns it and the chrome extension refreshes the webpage with the new source file. The machine learning program uses "multinomialNB()", a function in Scikit-learn that is similar to the Bayesian Spam Filter to identify if a certain webpage contains spoilers or not by being trained through text files stored in the main directory.

How we built it

  • Frontend: -> we used paint3D to create the background and the icon for the extension -> buttons and input box were created using HTML forms -> the main google extension was created using HTML and also a manifest.json file

  • Backend: -> we used Scikit-learn to create a machine learning program and the text files used to train the program are scripts from Star Wars (spoilers) and other movies (non-spoilers)/articles. We are currently just using Star Wars scripts for testing as we intend to give the user the ability to enter the movie of his/her choice to improve the usability of the program as the user will have more control over what movies he/she would like to have blocked. -> the web-scraping program was done by creating a personalized Python toolkit

Challenges we ran into

  • linking all three components (machine learning program, google extension and web scraping program) together
  • making the AI smarter
  • implementing javascript in a google extension

Accomplishments that we are proud of

  • able to create a program that is capable of web scraping
  • able to create a google extension
  • able to train a program to identify a spoiler

What we learned

  • how to use Scikit-learn
  • improving the accuracy of machine learning is difficult
  • chrome extensions do not like javascript in HTML forms
  • how to use paint3D
  • how to make a working Chrome extension
  • how to use Django and Flask for web dev

What's next for Hapocrates

  • more test files i.e different movie categories for the program to learn
  • using databases to store spoiler content redacted from the webpages to improve memory management in the program

*we did not manage to link all the three components of the program but we have three separate working parts of it which we would like to show *the github repo contains these components

Share this project: