Inspiration

In the last few years, the amount of spam messages have been increasing. Many of which are scams. This is a serious problem which Singapore is facing on a daily basis. As such, my group wanted to use machine learning to detect spam messages.

What it does

The model is trained to recognise spam messages by using machine learning methods.

How we built it

We used tensorflow , NLTK and other helpful libraries such as numpy, panda and spacy to train the model to detect spam messages. We used an SMS spam collection dataset from kaggle, which includes over 3000 random SMS of the NUS SMS Corpus (NSC). After training the model, we extracted it into a web application using Streamlit, to demonstrate it.

Challenges we ran into

Key challenges we ran into included the accuracy of the model in real life test cases. We actually started with a fake news detector project, but the datasets are limited, and the one we used had little local representation. As such, the model ended up being too inaccurate when tested with real life articles. We faced similar problems with the spam message detector but because there were higher quality datasets available to train with, we managed to get a much higher accuracy when testing with real life messages.

Accomplishments that we're proud of

We are proud of being able to build the model, despite having very little experience. It was a lot of work, but we learnt a lot through the process.

What we learned

We learnt a lot about working with tensorflow and nltk. We also familiarised ourselves with importing datasets and cleaning them to make them suitable for machine learning. More importantly, we got to learn basic machine learning modelling with tensorflow. Aside from the technical aspects, we learned how to adapt ourselves to efficiently work with multiple unfamiliar libraries to realise our project.

What's next for Spam message detection using Tensorflow and NLTK

We feel that it can be further extended to detecting spam calls. Currently, the recently launched app ScamShield uses a manually updated database of scam callers to decide whether to block the number. We can use machine learning to detect spam callers by: monotony of voice, frequency of calls, language analysis of call and other many other variables. We can use VoIP phones on a server to collect these scam calls to grow the dataset.

Built With

Share this project:

Updates