Inspiration/Projection Description

We approached the Health Equity Track. The Beginner Overlay applies to our project. Healthcare fraud causes tens of billions of dollars in losses each year. Phishing is one of the most common methods scammers use to engage in healthcare fraud. We wanted to create something that would help mitigate healthcare fraud and keep patients and healthcare professionals safe online.

What it does

We developed a machine learning model using scikit-learn that classifies healthcare emails on a scale of 1 to 5 where 1 is most fraudulent and 5 is least fraudulent.

How we built it

The algorithm we used was the TF-IDF (Term Frequency-Inverse Document Frequency) Vectorizer from the library scikit-learn. TF-IDF vectorizer takes into account how many times a word appears in a text and also how important that word is. We created a list of the most common phrases used in legitimate and fraudulent healthcare emails and assigned them a value of 1 to 5 where 1 is most fraudlent and 5 is least fraudulent. This was the data we trained our algorithm on.

Challenges we ran into

Both of us had limited machine learning experience, so we had to conduct a lot of research on which algorithm and what data to use.

Accomplishments that we're proud of

We're proud of participating in our first hackathon and creating a machine learning model.

What we learned

We learned the worflow of creating a machine learning model and the algorithms used by spam filters.

What's next for Healthcare Email Fraud Detection

We want to make our model more accurate by extracting more features from healthcare phishing emails and websites. These features include web address length, the number of dots in the URL, and the number of emotionally charged words.

To use our model, click on the first link under the "Try it out" section

To view our presentation, click on the second link under the "Try it out" section

Built With

Share this project:

Updates