-
Precision is the ability to label correctly. Recall is the ability to find all instances of a particular label. Ideally, both would be 100%
-
Displays the confidence of the predictive model by showing the relationship between the predicted probability and the actual probability.
-
The rows represent the true class; the columns the predicted class. Ideally, these would match so the darker colors are along the diagonal.
Inspiration
Phishing does not just siphon millions in wealth from company data breaches, it exploits vulnerable individuals under fraud. Phishing attacks have been increasingly dramatically in recent years (Avanan reports a 65% from 2016 to 2017) and the primary victims are us, normal people who just happen by chance not to check a link one time, but that one time is all phishers need to steal personal and financial information. Our data needs to be protected, our right to privacy preserved, and our freedom on the Internet expanded, and that is the social good we hope Angler will help achieve.
What it does
Predicts whether a link will be a phishing link solely from the URL with a reference to the trained rest API from Azure. In detecting a potential phishing link, Angler will redirect the user and prompt them if they want to proceed to the website. This should improve data security and help protect people from fraud.
How we built it
Google Chrome extension frontend in HTML/CSS; back end in JavaScript. ML backend in Azure’s ML Studio.
Challenges we ran into
Using only the given categories of data (age of domain, time until expiry, and time since last update), we could not achieve past 88% accuracy. It is only when we further parsed the URL (e.g., domain extension, length of longest string without slashes or periods, http vs https) that we saw major improvements.
Accomplishments that we're proud of
Around 95% accuracy over 4000 training points. Excellent teamwork and motivation. (FCA CTF 2nd out of 67 teams.)
What we learned
Many team members worked with languages/environments they had not before. For instance, Eugene explored node.js (he didn't like it) and Wally worked on frontend with HTML and CSS (he did like it). After using Azure's ML integration, we've become much more appreciative of the resource and look forward to using it in the future. At the same time, though we now recognize the power of this tool, we've discovered the importance of the non-automated part--choosing the right features and parameters to optimize our model.
What's next for Tangler
Catch us on the chrome extension store soon! We hope our little project can make a positive difference in the world.
Log in or sign up for Devpost to join the conversation.