Inspiration
Spam Mails are one of the major reasons behind phishing attacks. Phishing has become one of the most pernicious dangers in cybersecurity today. Even though awareness of the problem has been rising based on data from numerous sources including the Verizon Data Breach Report, there is a risk that people are getting jaded with the daily news bombarding them about the latest phishing attacks. These phishing statistics are up from 76% in 2017, and experts predict another six billion attacks to occur throughout 2022. The impact of these phishing attacks will be realized by the compromised accounts, malware infections, and loss of data left in their wake. Approximately 15 billion spam emails are sent daily, 45% of all email is spam. So, a way to detect these mails are required more often than ever. We have tried to increase the accuracy as much as we can in the simplest way possible to make the model as much efficient as possible.
What it does
Using the best machine learning models available for classification problem. Improve the accuracy using Ensemble Learning. To reduce the Time Complexity. Making the model efficient, by using Ensemble Technique with less calculation load. Making a user friendly UI for the Flask App. Implementing the model into the App as efficiently as possible.
How we built it
By using the following algorithm : Step 1: Import the models using joblib.dump() Step 2: Receive the message to be predicted from the front end. Step 3: Use each model to predict the output. Step 4: Initialize Spamscore (SpamScore = 0) Step 5: If model 1 predicts spam then SpamScore + 1 Step 6: End if Step 7: If model 2 predicts spam then SpamScore + 1 Step 8: End If Step 9: If model 3 predicts spam then SpamScore + 1 Step 10: End If Step 11: If SpamScore > 1 then Set Result to “This Message is a SPAM Message.” Step 12: Otherwise Set Result to ”This Message is Not a SPAM Message.” Step 13: End If Step 14: Send the result to the front end. Step 15: If the result is spam then Display spam screen Step 16: If the result is Ham then Display Ham screen Step 17: End Process
Challenges we ran into
1.We tried different ML models like SVM, RandomForest, Decision tree, Multi-NB, and they have shown comparatively lower accuracy scores while training the data.
- So we finally tried Ensemble learning method to increase the accuracy score. 3.And fortunately, finally we have got the accuracy score of our Ensemble ,model with approximately 98.5 % accuracy score.
Accomplishments that we're proud of
- It was our first hackathon as a team. It was such a wonderful experience to participate as a team, work together, order uber, watch youtube videos in the short breaks.Thank you
What we learned
1.Time management 2.Instead of sticking with one particular model , if we try the different methods on same data. there may be a better model which perfectly help us to get a model with better accuracy score
What's next for ENSEMBLE SPAM DETECTOR
We would like to use this project as a building block for our future works (improvements):
- Like including other models that could fetch us better result.
- Include the feature of stream learning that will help in removing target-based spam and also improve the model in real time.
- Artificial Neural Networks can be included to emulate human -like behaviour to categorize mail as spam or ham.
Log in or sign up for Devpost to join the conversation.