About the Project

Inspiration

The increasing volume of spam messages inspired me to develop an AI-powered SMS Spam Detection system that accurately classifies ham (legitimate) and spam (unwanted) messages.

What I Did

I implemented Natural Language Processing (NLP) and Machine Learning (ML) techniques to build an efficient spam classifier. The key steps included:

  • Data Preprocessing: Used NLTK, Regex, and feature engineering to clean and prepare the dataset.
  • Exploratory Data Analysis (EDA) & Visualization: Analyzed data distribution and patterns.
  • Feature Extraction: Applied Bag of Words (BoW) and TF-IDF for text representation.
  • Model Training & Evaluation:

    • Gaussian Naïve Bayes with BoW achieved 88% accuracy.
    • Gaussian Naïve Bayes with TF-IDF also achieved 88% accuracy.
    • Random Forest with BoW achieved 97% accuracy.
    • Random Forest with TF-IDF delivered the highest 98% accuracy.

Challenges Faced

  1. Selecting the best feature extraction technique for optimal classification.
  2. Handling imbalanced data to improve model performance.
  3. Fine-tuning ML models for better spam detection accuracy.

Conclusion

This project successfully demonstrates the effectiveness of ML and NLP in spam detection. Random Forest with TF-IDF provided the best results with 98% accuracy, while Random Forest with BoW achieved 97%, and Gaussian Naïve Bayes performed consistently with 88% accuracy for both BoW and TF-IDF. This highlights the impact of feature extraction and model selection in text classification.

Share this project:

Updates