Inspiration

A few weeks ago, a family member of one of our team members fell victim to a phishing attack via email. The attacker managed to acquire personal information, such as the victim's date of birth and full name. Recognizing the prevalence of this issue, we decided to combine our interest in machine learning with a pressing need for better email/text message security. Thus, GoPhish was born.

What it does

GoPhish is a cutting-edge tool that uses advanced machine learning algorithms and natural language processing to detect email and SMS scams. With this powerful tool, you can easily copy and paste any suspicious message into the text box and receive an instant analysis of its authenticity.

How we built it

We began by obtaining a Spam Collection dataset from Kaggle and used the natural language processing library NLTK to process the data for analysis and training. We employed algorithms such as tokenization, stemming, stop words, and classification. After evaluating and testing the model's capabilities, we proceeded to develop the website and integrate our machine-learning model using a simple Python backend with Flask. To account for potential inaccuracies in the ML model, we incorporated additional features like grammar checks using the Flesch-Kincaid Grade Level algorithm to determine message readability and a URL safety check based on standard valid URL configurations.

Challenges we ran into

During the machine learning model training, we encountered issues with overfitting due to the dataset's imbalance between spam and valid emails (82% valid emails and only 28% spam emails). With the guidance of our mentor, Kevin, we learned how to sample the data equally for each target class, improving the model's performance significantly. Additionally, two team members unexpectedly left the Hackathon on Saturday morning, leaving Jessica and Jin to manage the remaining tasks under a tight deadline. Despite the challenges, we successfully delivered our final product

Accomplishments that we're proud of

We are proud of our ability to tackle a difficult project, learn new skills, and develop a solution that could potentially benefit users worldwide. Neither machine learning nor website development is typically taught in school, which meant we had to invest considerable time and effort in self-directed research and learning. Our experience demonstrates our competence and adaptability.

What we learned

Throughout the project, we learned how to approach unfamiliar topics and overcome the intimidation they may bring. We gained valuable experience in machine learning, natural language processing, and web development.

What's next for GoPhish

We plan to expand GoPhish's capabilities by bringing it to mobile platforms for increased ease of use and accessibility. Additionally, we aim to implement Augmented Reality (AR) functionality, allowing users to analyze emails or submit screenshots by simply holding up their camera or phone, further enhancing the user experience in detecting phishing emails.

Share this project:

Updates