Hapocrates

Inspiration

We were interested in the concept of the Bayesian Spam Filter and decided to implement a similar approach to detect and remove movie spoilers instead of spam emails from different webpages.

What it does

The program takes in two separate inputs: a website's URL and movie title and scrapes the said website to obtain the required information from its source file. It then removes words or phrases based on a spoiler library made by analyzing movie-related webpages like IMDB or Wikipedia and updates the source file. The program then returns it and the chrome extension refreshes the webpage with the new source file. The machine learning program uses "multinomialNB()", a function in Scikit-learn that is similar to the Bayesian Spam Filter to identify if a certain webpage contains spoilers or not by being trained through text files stored in the main directory.

How we built it

Frontend: -> we used paint3D to create the background and the icon for the extension -> buttons and input box were created using HTML forms -> the main google extension was created using HTML and also a manifest.json file
Backend: -> we used Scikit-learn to create a machine learning program and the text files used to train the program are scripts from Star Wars (spoilers) and other movies (non-spoilers)/articles. We are currently just using Star Wars scripts for testing as we intend to give the user the ability to enter the movie of his/her choice to improve the usability of the program as the user will have more control over what movies he/she would like to have blocked. -> the web-scraping program was done by creating a personalized Python toolkit

Challenges we ran into

linking all three components (machine learning program, google extension and web scraping program) together
making the AI smarter
implementing javascript in a google extension

Accomplishments that we are proud of

able to create a program that is capable of web scraping
able to create a google extension
able to train a program to identify a spoiler

What we learned

how to use Scikit-learn
improving the accuracy of machine learning is difficult
chrome extensions do not like javascript in HTML forms
how to use paint3D
how to make a working Chrome extension
how to use Django and Flask for web dev

What's next for Hapocrates

more test files i.e different movie categories for the program to learn
using databases to store spoiler content redacted from the webpages to improve memory management in the program

*we did not manage to link all the three components of the program but we have three separate working parts of it which we would like to show *the github repo contains these components