In machine learning class, our professor was talking about Naïve Bayes Classifiers and how good they are at classifying spam, so we decided to implement one to cut down on the phishing emails we get at WSU.

What it does

We calculate the probability of a given email being spam or not spam based on the words contained in it. Whichever has the higher probability is how the email is classified. In addition, we provide verification of an authentic DocuSign link in the body of an email if one is present.

How we built it

Since we wanted to build our model specifically for the purpose of filtering WSU spam, we used WSU emails as data. To do so, we used a VBA script to export emails from outlook. From there, we used Python to convert the emails into a readable format, hand label the data, create test and training sets for the data, and finally build our model.

Challenges we ran into


Accomplishments that we're proud of

Our model has a 96% accuracy rate with 0 non-spam emails marked as spam.

What we learned

Lots of Python (decimal is very helpful when you have numbers on the order of e-100), Encoding is fun

What's next for Project name

We hope to extend our model into an Outlook extension. From there we could do things like flagging spam directly in the inbox and improving detection of phishing attempts through services like DocuSign since we would have access to more information (subject line, sender, etc)

Built With

Share this project: