Inspiration
In the last few years, the amount of spam messages have been increasing. Many of which are scams. This is a serious problem which Singapore is facing on a daily basis. As such, my group wanted to use machine learning to detect spam messages.
What it does
The model is trained to recognise spam messages by using machine learning methods.
How we built it
We used tensorflow , NLTK and other helpful libraries such as numpy, panda and spacy to train the model to detect spam messages. We used an SMS spam collection dataset from kaggle, which includes over 3000 random SMS of the NUS SMS Corpus (NSC). After training the model, we extracted it into a web application using Streamlit, to demonstrate it.
Challenges we ran into
Key challenges we ran into included the accuracy of the model in real life test cases. We actually started with a fake news detector project, but the datasets are limited, and the one we used had little local representation. As such, the model ended up being too inaccurate when tested with real life articles. We faced similar problems with the spam message detector but because there were higher quality datasets available to train with, we managed to get a much higher accuracy when testing with real life messages.
Accomplishments that we're proud of
We are proud of being able to build the model, despite having very little experience. It was a lot of work, but we learnt a lot through the process.
What we learned
We learnt a lot about working with tensorflow and nltk. We also familiarised ourselves with importing datasets and cleaning them to make them suitable for machine learning. More importantly, we got to learn basic machine learning modelling with tensorflow. Aside from the technical aspects, we learned how to adapt ourselves to efficiently work with multiple unfamiliar libraries to realise our project.
What's next for Spam message detection using Tensorflow and NLTK
We feel that it can be further extended to detecting spam calls. Currently, the recently launched app ScamShield uses a manually updated database of scam callers to decide whether to block the number. We can use machine learning to detect spam callers by: monotony of voice, frequency of calls, language analysis of call and other many other variables. We can use VoIP phones on a server to collect these scam calls to grow the dataset.
Log in or sign up for Devpost to join the conversation.