ChainMail

Title Card
The homepage of the interface

Inspiration

We took it upon ourselves to try and improve data security using machine learning. Phishing attacks are one of the most common kinds of cyber-crime and fraud, so it is imperative that there are tools out there to help the average person determine if an email is to be trusted or not.

What it does

This program uses a web interface to allow users to either log in to an email account and select emails, or upload email files manually, and it simply determines if the given email is trustworthy or not.

How we built it

For organisation, we split the workload and brainstormed ideas using Trello: an online post board where you can add notes and ideas as a team. This helped us decompose the whole web application into manageable pieces and ensure we all knew what each other were doing and how our progress was getting on.

To make the email classifier we explored data sets on Google's Kaggle and loaded an appropriate one into Python. Using scikit-learn we processed the data and fitted a support vector machine to it so that it can learn which emails are spam and which are not. We chose the best parameters for the most optimal training session, and used the library pickle to finalise the model. This meant that although training took over 10 minutes, the pickled version ran in seconds on the actual website making it appropriate for use in a proper implementation. These functions were then ran in flask.

Challenges we ran into

Different data models for the ML aspect of the project had varying levels of effectiveness; we had to find and implement one that fit our purpose. Being inexperienced with using Git in a team project, we had to deal with many merge conflicts. Centering a box using CSS was probably the hardest part.

Accomplishments that we're proud of

We made a robust web-based interface; and actually managed to get a working final product after many hours of only having separate WIP parts.

What we learned

We are now a lot more familiar with the Python framework Flask, as well as email data and formats; not to mention learning about the ML model we used.

What's next for ChainMail

In terms of the web application, we would like to improve the appearance of the web application by considering devices with smaller widths and the general aesthetic of the template. Also, one of our initial ideas posted on our Trello board was to generate interesting statistics by parsing through the user's inbox. For example, most repeated topics and words.

In terms of the ML model, we would train the model using more data and improve using inputted user data. Also, we would like to experiment and compare different models. Another idea on the Trello board was also for the model to generate new spam / valid emails given a topic to observe what interesting emails the model would return.

Built With

cryptography
flask
html
imaplib
jinja
numpy
pandas
pickle
python
scikit-learn

Submitted to

hackSheffield 5

Created by

I worked on bridging the front-end and backend; including parsing inputted data and handling email requisition. I was also responsible for managing the movement of data through the user interface; making sure the pages had the correct functionality.

Marcy Brook
I worked on the backend. I was in charge of data processing and machine learning where I built a support vector machine in scikit learn to classify the emails. I finalised this in python and handed this off to the other developers

finn-williams Williams
I worked on website design and content using Flask in Python, styling the data passed towards the web pages. Mostly ate food too :)

jluong23

Updates

Marcy Brook started this project — Nov 03, 2019 05:11 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.