PhishNet: Catching Phishing Emails Before They Hook You!

Inspiration

As U of T students, we often receive phishing emails from users claiming to represent the university. These emails often lead to fake websites and ask for personal information, posing security risks to students. However, these risks can affect any individual or organization with an email account. In particular, studies have found that Black and Latino individuals are often more likely to be prone to these scams (see references below). To address this issue, we created a Google Chrome extension for gmail classification called PhishNet.

What it does

If a user has PhishNet installed, then each time they receive a gmail message, they can activate PhishNet by clicking on it. The user can then submit the content of the email's body in a text box. PhishNet extracts this text, processes it and determines if the email is phishing or legitimate. It then informs the user of the email's status ("Phishing detected" or "Phishing not detected").

How we built it

For the phishing detection algorithm, we trained a binary classification machine learning model in Python. We used a phishing email dataset from Kaggle for training the model (see below reference); the model took the body text of each email in the dataset and processed it in a series of steps (tokenization, removing stop words, lemmatization, stemming). It then vectorized the processed text. We then created a training-validation-test split, and created a support vector machine (SVM), a logistic regression, and a naive-based model and fitted each of these on the training data. We found that the logistic regression model had the highest validation accuracy score and fastest running time.

We then applied the logistic regression model to produce a new function that takes email body text as input, processes and vectorizes the text as described above, and runs the fitted model to classify the email as phishing or legitimate.

We used JavaScript and HTML to create the user interface, including the text box. Once the user submits the body text of the email in the text box, the program will run the function described above to classify the email. It then returns an appropriate message to the user.

PLEASE NOTE: The attached video below demonstrates how the extension is expected to run.

Challenges we ran into

It was initially difficult to connect the JavaScript/HTML user interface with the Python script for the detection model.

Accomplishments that we're proud of

Despite the challenge mentioned above, we were able to complete our project with the help of online tutorials and guidance from NSBE mentors.

What we learned

We learned that being assigned to produce a product and pitch in a limited amount of time compels us to produce simple, yet effective solutions, even if they are only small-scale.

What's next for PhishNet: Catching Phishing Emails Before They Hook You!

Although PhishNet is designed for individual use, we hope to extend it to use by businesses, banks, and other organizations to protect them from security invasion and financial risks.

References:

[1] AARP Louisiana. (2022, March 14). AARP report reveals 40% of black and Latino adults have been targeted by a Scam. AARP Louisiana. https://states.aarp.org/louisiana/aarp-report-reveals-40-of-black-and-latino-adults-have-been-targeted-by-a-scam

[2] Rodriguez, Y. (n.d.). Report: Black and Latino consumers twice as likely to be targets of online financial fraud. Texas Standard. https://www.texasstandard.org/stories/consumer-reports-online-financial-fraud-black-latino-communities-phishing-rates-protection/

Dataset: Al-Subaiey, A., Al-Thani, M., Alam, N. A., Antora, K. F., Khandakar, A., & Zaman, S. A. U. (2024, May 19). Novel Interpretable and Robust Web-based AI Platform for Phishing Email Detection. ArXiv.org. https://arxiv.org/abs/2405.11619

Built With

Submitted to

NSBEHacks 2025

Created by

For this project, I contributed to the backend by preparing dataset for training the model. I also worked on testing other models such as the SVM (Support Vector Machine) to find the model with the highest accuracy rate and lowest time complexity.

Rodoshi Mondal
Seemal Sipra
AfraAd Azad
Anna Asmaryan

Updates

Seemal Sipra started this project — Feb 16, 2025 11:59 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.