Inspired by pagerank, we thought we could find other criteria on how to score the trustworthiness of a site based on network traffic data as well as other metrics. We've since added additional functionality that raise user awareness about dangerous links and websites.
What it does
It's main function is to generate a score based on various metrics that we could gather. Additionally, it adds more security by giving an option to reject all traffic from website not using safe protocols as well as protects against SSL stripping. It also contains a scanning feature which identifies html elements that contain script and makes them more easily noticeable.
How we built it
For the scoring function, we used a logistic regression algorithm to generate weights for the data that we could gather. Using the weights, we can plug in new data and generate a probability of safety. For our front end and more direct to the user security and privacy functionality we built a Google Chrome extensions to act as a medium for our machine learning functions and to regulate access to certain links (helping to mitigate the risk of SSL stripping) while also giving users information about what is behind page links before they click them.
Challenges we ran into
The chrome extension came with limitations we were not aware of beforehand. As a result, we could not accomplish everything we intended to. In addition, there was a lot of data that was categorical which increased the number of variables we had to find with each unique value.
Accomplishments that we're proud of
We managed to get functionality on most of the features we wanted to implement. We also deployed a boosted tree algorithm based on the same data on google cloud for future use. This will allow more features if we decide to build eth0s on a different platform.
What we learned
We learned about the limitations of chrome extensions. In addition, we learned about the many different machine learning algorithm frameworks as well as different ways to deploy the models.
What's next for eth0s
Think of a way to utilize cloud machine learning to learn as new data becomes available as the dataset we used will become less and less relevant over time.