Phishing is one of the most common cyberattacks in the world, and over 12 billion dollars are lost each year to phishing attacks. This problem is also personal to us, as we have family members who fallen victim to phishing attacks. Businesses spend tens of millions every year trying to safeguard themselves from these attacks. Many people have lost their life savings after falling victim to these attacks. This disproportionately affects those who less familiar with technology, who are more likely to fall prey to these attacks. Solutions to phishing are generally for the user to keep things in mind when browsing the web and checking their emails, but these are easy to forget. Our extension is designed to remove the reliance on remembrance, and let the computer help the user stay safe.
What is phishing?
Phishing most commonly occurs when one receives an email from an imposter, and the imposter tries to get their target to turn over information. Usually, you are redirected to a fake webpage where you are directed to log in, and you accidentally enter your true credentials/payment info into the website. That website then logs this information, and will use it to hack into your accounts and to get control over your credit card.
What it does
We designed PhishingNet, a Chrome extension which uses Deep Natural Language Processing and checks if a website has a valid HTTPS certificate to detect phishing. The extension provides an icon in the toolbar which the user can click on to see if a website is a phishing website or not.
Frameworks Used/How we built it
- User opens up website
- User clicks on our browser extension, and the popup is shown on the screen.
- The extension captures the URL and sends it over a POST request to our scraper API, which scrapes the HTML from the page and sends it back.
- The extension parses the HTML to find the main paragraphs, which are the most telling in detecting phishing.
- The extension then sends the main paragraphs to the machine learning API, which returns a phishing prediction and its confidence that it is correct.
- We check if the site has a valid HTTPS certificate.
- The popup now shows the user a summary of the information, and an overall decision for whether the site is phishing or not.
Challenges we ran into
- We initially planned on scraping the HTML locally (client side), but we realized that it would cause a lag. We resolved this through doing this on a server (hosted by Heroku) that would scrape the HTML.
- We were unsure about which platform we should use to create the model. We initially planned on using IBM Watson's Text Classifier, however we had issues with launching our model on there. Furthermore, this was hard to scale, as IBM charges per request (a relatively expensive amount), along with a fee for an account or training. Because of this, we chose to use Microsoft's open source ML.NET framework to train the model. We then used DigitalOcean to host this API, which is open for use by other developers.
Accomplishments that we're proud of
- We were able to create a fully functional extension, which we're releasing to the Chrome Web Store (hopefully our submission to the web store is approved in time for judging).
- We made an application that could scale, and used open source frameworks that are known for reliability
- We created two APIs that we integrated into our extension, which we made open source. This means that we can get improvements from the community, and others will be able to use our work in their own code. Furthermore, we are allowing other developers to build similar solutions with our APIs, furthering the fight against phishing.
What we learned
What's next for PhishingNet
We are in the process of publishing version 1.0.0 to the chrome web store. We have run UX tests on our family and integrated their feedback in this extension. In the future, we hope to train our model on different languages, to ensure that this project can scale globally.