🔍 Inspiration The inspiration for this project came from a personal experience—one of our team members was nearly a victim of a sophisticated job scam. What seemed like a legitimate offer from a reputable organization turned out to be a fraud, exposing the emotional and financial risks that countless job seekers face every day. This incident fueled our motivation to build a robust, intelligent system that helps protect others from falling into similar traps.
🛠️ What We Built We developed JobSentinel, a machine learning-based system that classifies job offer emails as legitimate or fraudulent. We trained and evaluated multiple models including Logistic Regression, SVM, and Random Forest on a hybrid dataset built from real, synthetic, and crowdsourced job emails. The final model, a tuned Random Forest classifier, achieved 96.7% accuracy and excellent recall, making it highly effective at flagging scam emails.
📊 Key Features
- Synthesized scam indicators (e.g., non-HTTPS links, bait phrases, ID info requests)
- SHAP-based explainability for model transparency
- Hybrid features from email content, metadata, and sender behavior
- Real-time detection-ready architecture
💡 What We Learned We learned that the quality of features often outweighs model complexity. We also gained hands-on experience with class imbalance challenges, hyperparameter tuning, feature engineering, SHAP interpretability, and agile team collaboration.
🚧 Challenges
- Balancing a highly imbalanced dataset (only ~6% scam emails)
- Creating realistic synthetic scam data without introducing noise
- Ensuring our model could generalize well to new, unseen scam tactics
- Making the system explainable and suitable for real-time integration
🌱 Impact JobSentinel goes beyond spam filtering. It supports social sustainability by protecting vulnerable groups, economic sustainability by reducing identity fraud, and contributes to UN SDG 8 by promoting ethical recruitment practices.
Built With
- adaboost
- emscad-dataset
- enron-email-dataset
- git
- github
- google-colab
- google-docs
- google-meet
- jupyter-notebook
- matplotlib
- numpy
- pandas
- plotly
- python-3.10
- scikit-learn
- seaborn
- shap
- synthetic-email-data
- trello
- vs-code
- xgboost
Log in or sign up for Devpost to join the conversation.