Inspiration
As people spend more time online, the risk of encountering malicious advertisements is higher than ever. A single click on a deceptive ad can lead to severe consequences, such as downloading viruses, spyware, or ransomware that can steal personal data, hijack devices, or drain bank accounts. Some ads redirect users to phishing sites that trick them into entering sensitive information, while others exploit browser vulnerabilities to install malware without any user action. Having a reliable tool that instantly detects whether an advertisement is safe or malicious can prevent these cyber threats, protect user privacy, and save individuals and businesses from costly damages.
What it does
S.O.S is a Google Chrome extension designed to protect users from malicious advertisements by detecting whether the links they contain are safe or dangerous. When a user hovers over an ad, the extension automatically sends the URL to the backend, where a machine learning model analyzes it based on patterns of known malicious and safe links. Within seconds, the system determines the risk potential of the URL. A popup notification then appears, clearly informing the user whether the link is safe to click or potentially harmful, helping them avoid phishing scams, malware downloads, and other online threats.
How we built it
We obtained a sufficiently large dataset containing various types of URLs. Next, we performed feature engineering by examining different parts of each URL to assess potential risks. We then encoded the URLs using various transformers from HuggingFace and combined these representations with our engineered features for more precise classification. After evaluating multiple classifiers, we selected the one with the highest accuracy. Finally, we developed a Chrome extension and deployed the model on AWS Lambda to deliver the service to users.
Challenges we ran into
We originally planned to integrate an appropriate BERT model to encode URLs for better performance. However, due to the free-tier limitations of AWS Lambda and the HuggingFace API, we encountered issues fully testing this approach. Given our time constraints, we decided to proceed using only our engineered features with the best-performing classifier.
Additionally, we faced challenges incorporating Light Gradient Boosting Machine (LightGBM) with AWS Lambda—even though it outperformed other models we tested, such as SVM, Random Forest, XGBoost, Neural Network, and Graph Neural Network. Developing a Chrome extension also proved more complex than expected, as integrating the frontend with the backend differs significantly from standard web development.
This project was our team’s first experience working with Go, which added an extra layer of difficulty. It took considerable time to gather a robust dataset, and we needed to select a classifier that balanced both accuracy and user delivery speed. Finally, setting up communication between AWS Lambda functions built in different languages (Golang vs. Python) presented additional hurdles.
Accomplishments that we're proud of
- Successfully developed a real-time URL classification model that detects malicious phishing links with high accuracy.
- Implemented a scalable design, allowing for future improvements such as integrating threat intelligence databases and machine learning updates.
What we learned
- The importance of feature engineering in URL classification to enhance model performance while minimizing training time.
- How to optimize machine learning models for real-time applications, including trade-offs between accuracy and speed.
- How to develop and deploy a Chrome extension, connecting backend machine learning models with a browser-based security tool.
- The significance of cybersecurity in everyday browsing, reinforcing the need for proactive security solutions.
What's next for S.O.S (Safe or Sus)
- Enhancing detection methods by integrating external threat intelligence sources and continuously updating the model with new phishing tactics.
- Expanding browser support, making S.O.S available for Firefox, Edge, and Safari.
- Improving explainability, providing users with more insights into why a URL is flagged as suspicious.
- Building a reporting system, allowing users to submit false positives or newly discovered phishing URLs to improve model accuracy.
Log in or sign up for Devpost to join the conversation.