Scout

Sample page for a data breached website
Sample page for what happens when the large button is pressed
Hyperlinks work! Sample website redirect.

Video Demos:

https://youtu.be/zLCT0TUKSng - 1 https://youtu.be/Yoz4mWRG6bc - 2

Inspiration

The inspiration for our project was finding out that one of our emails had been “pwned”. When we saw which websites had been breached, we noticed that one of those sites had been a website one of our parents had made an account with. Our parents, not knowing the history of the site and its data breach, made an account with incomplete information about the potential safety of their data. So we created Scout—to scout a user’s website history and inform users of the safety of their current web use. The fundamental thesis rests upon ethics - not only is our product designed to alleviate ethical concerns regarding data breach awareness, but was designed with ethical considerations as a guiding principle. Our product directly addresses the ethical issue of data breaches and personal privacy. Websites have an incentive to not publicly advertise previous data breaches to new consumers, which creates an ethical dilemma for new users who would otherwise alter their behavior using a platform with a history of unsafe data practices. That’s where Scout comes in: Scout maintains a database of data breaches on companies and includes specific details on severity, scope, and other important qualifiers to inform consumers of the safety of ALL websites they see.

What it does

Scout operates as a simple chrome extension. Scout checks the current website a user is on, scours our database of known data breaches scraped from various databases and news sources, and processes such information with our safety classification algorithm to inform users how safe their data is. For simplicity we have 3 categories—Yellow: a site with past safety concerns that were mild in nature, Orange: sites with considerable safety concerns, and Red: a site that has had both a large number of users’ information leaked, as well as the information being sensitive. Because of our commitment to ethics, we do not specifically recommend anything to the user. Instead, we simply provide the information, and allow for the user to make their own decision from the objective information we garner for them. Scout operates identity-agnostic. We do not store previous browser data on our consumers, nor their e-mails or other private information. As a result, Scout won’t sell any information, let alone store information on our users. Scout also promotes ethical decision-making and benefits for all stakeholders. Data breaches have three main stakeholders: consumers, websites, and malicious actors. Consumers benefit from knowledge of a website’s history of data breaches, allowing for them to make better informed decisions. For example, consumers who tend to make their password the same across all of their accounts may decide to create a separate password for a website they are informed has a history of data breaches. Underrepresented groups are specifically at-risk more than their counterparts as they often lack the technical literacy to navigate cybersecurity (which Scout helps promote) and economic resources to rebound from hacked accounts (which Scout helps prevent). Websites are either adversely affected or benefitted. Websites who have a history of data breaches will be hurt, however Scout’s algorithm specifically allows for their “severity” rating to decay as the time from the last breach increases (i.e. a breach in 2000 won’t ruin a business). This feature encourages vigilance from companies to ensure they limit any potential risks to data breaches. Although data breaches may have negative press currently, such press is not only temporary, but also limited to major companies. Scout brings out public information to all websites, democratizing the process. Conversely, safe websites are benefited, as they will be signaled as safe websites to consumers. We are particularly optimistic about the way Scout can shift market dynamics, and make it unprofitable to be unsafe: unsafe sites receive less traffic than their safe competitors, creating an economic incentive for websites to prevent data breaches. Malicious actors unequivocally lose. Not only will users become more vigilant when operating on sites, but companies will invest more in privacy if it affects their bottom-line more significantly.

How we built it

We built our project through Bubble’s no-code platform. With a few plug-ins, we integrated our web app with Python, Pandas, Sci-kit learn, NumPy, Requests, JavaScript, TypeScript, and Google Chrome Extensions. We did this by scraping and aggregating data breach information across multiple news sources and verified databases. Once we collected our data, we implemented a recurring pipeline that aggregates the previous data and outputs it as a CSV, adjusted for time decay. Next, we imported the CSV into the bubble.io database which was then split into multiple tables, as detailed in Scout’s architectural diagram. When building the tables, we used Sci-kit learn to understand the most influential variables related to breach severity, leading us to develop accurate values. These tables were then consumed by bubble.io’s extensive querying support to generate a personalized front-end. The pages for the front-end varied from introductions about the breach to specific information about how users can prevent themselves from information leaks—further details are included in the Scout architectural diagram. While bubble was an excellent platform to build a website on, we wrapped our breach detection in Google chrome extensions, taking our bubble front-end and presenting it to users in the event of a data breach. The final product ended up enabling very fast iteration speed between a new breach and subsequent user notification.

Challenges we ran into

We faced multiple challenges. One challenge was getting used to the Bubble system, and learning how to integrate Bubble with some of the other tasks we are used to doing. Though there was a learning curve, we are grateful to Bubble for providing their fantastic no-code solution and appreciated the plethora of plug-ins we could use to make a fantastic web app. Another challenge was deciding between functionality and ethics. Although some members wanted to allow Scout to store previous history to deliver weekly reports, we ultimately decided that storing previous browser history was an overreach into user’s privacy. Instead, we decided to limit the scope of our product to democratizing information, not necessarily pushing recommendations.

Accomplishments that we're proud of

We are proud of our solution’s ethical standing. Although at times cutting corners might have allowed a more seamless development, we are proud to say that Scout’s system poses little to no threats to user privacy. There is more testing to be done, however, which we will continue to do. We are also proud by our ability to create an actionable chrome extension quickly. Before this project we had never created a chrome extension, and very little experience building web-apps. Thankfully, Bubble made it easy for us to incorporate our previous development experience and understanding of architecture to delve into a new field.

What we learned

We learned the ins and outs of creating chrome extensions, web-scraping, and designing products that are not only important but also safe. We also learned that you don’t need to know how to code to make actionable products :).

What's next for Scout

We plan to improve the UI of Scout, and go through our back-end to allow Scout to scale to handle more simultaneous operations. We are also looking into ways to optimize searching through our database to improve the run-time of our solution. Another avenue for Scout that was largely overlooked in the development of the product was monetization. We knew of Bubble’s amazing integration with Stripe, but we prioritized functionality over profitability. Whether we make the extension itself cost money, or create a process to continually monetize the use of Scout is an important decision we face.

Built With

bubble-backend
bubble-database
bubble-frontend
chrome
javascript
numpy
pandas
python
requests
scikit-learn
typescript

Submitted to

TreeHacks 2022

Created by

I worked on the front-end of the Chrome Extension, connecting the bubble.io frontend/backend framework to a chrome extension developed in HTML/CSS/JavaScript.

Andrew Li
I worked on the front-end of bubble, as well as developing the proprietary severity algorithm that we assign websites a score with.

Alex Zhang Shan
Aqil Naeem
Mario Ishac

Updates

Aqil Naeem started this project — Feb 20, 2022 05:15 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.