Inspiration
Data privacy is becoming increasingly more important in the age of Artificial Intelligence and with the rise of large scale hackings similar to that of Cambridge Analytica. We want people to live better lives while utilizing the internet for its benefits without compromising their digital rights.
Our government has realized the dangers of surveillance by these technologies, and now laws are finally in place that allow us to take back control. Our goal, as DataGuardian, is to empower individuals to learn more about data and the amount of information companies collect on unsuspecting users.
What it does
DataGuardian is an automated legal agent that exercises your digital privacy rights. We enfranchise users to safely use the internet and control how their digital footprint persists.
This solution opportunity arises from the GDPR and CCPA legal rights. The GDPR is a data privacy regulation in the EU implemented in May 2018, while the CCPA is a data privacy law in California implemented on January 1, 2020. Both give residents the right to know what personal information is being collected about them and the right to request that their personal information be deleted. In order for companies to operate in these markets, they must comply.
Therefore, we can automate this entire legal process for the user. As the user browses, we maintain records for them to recognize the current risk level of data leaks or sale involving their personal information. The user can then request to view or delete data from any visited website, and our agent will take care of the process from start to finish.
How we built it
Our tooling consists of three interconnected applications: a React frontend, a Flask backend, and a Chrome extension.
- We chose to use React.js to build up the frontend. OAuth was used to login and Bootstrap was to quickly design and get our app running. We used Axios HTTP requests to connect the frontend to the backend.
- For the Flask backend, we chose Python due to its data handling capabilities and libraries. Our API endpoints perform functionalities such as communicating with our Airtable database, which has three tables for users, companies, and historical records. We use OpenAI's Da-Vinci-02 completion engine to identify the appropriate email for each company's data privacy team, which posed a challenge in determining the optimal prompts to provide. To find the correct and best email, we employed creative workarounds such as partial use of regex. Once we obtained the email address, we utilized Cohere to generate the email content for claiming CCPA rights for that specific company.
- The Chrome Extension is vital for providing a seamless user experience. Upon installation, we continuously monitor the user's browsing history and dynamically load our database for the user's current state.
Challenges we ran into
- Chrome ext auth, chrome ext auth, chrome ext auth! We spent half a day trying to get MV3 extensions to successfully implement a “sign in with Google.” This is primarily due to recent updates that have lead to more restrictive recent security controls. Ultimately, we ended up creating a workaround by using a webpacked React chrome extension that leveraged firebase for authentication.
- Prompt generation. While NLP models are fantastic for our open ended use cases, the exact prompting and parameters require a fair degree of finetuning. We wanted to ensure the email addresses we found were valid and the email content was legally binding. OpenAI and Cohere Playgrounds became our best friends- plus a little bit of regex :)
- Mail Server. Automating the sending of mails was quite a challenge, due mostly to Gmail’s recent update in the SMTP client. We tried looking through several tutorials, but none of their recommendations and implementations worked for us. After some digging, we realized that Gmail had recently updated the way they worked with Python’s email client, requiring a cryptic “App password” that only one tutorial video discussed!
- Delay in receiving a copy of the data companies keep on us and deletion. Companies have no incentive to quickly get rid of your data, and the only company that was able to get back to us in time for LA Hacks was Google. Usually requests can take up to a month before you receive a response and it would have been extremely helpful in speeding up development had we known what the response emails we would receive were.
Accomplishments that we're proud of
Our beautiful tool Our twinkling posterboard Our curated memes Our amazing logo
Yes, in that order :)
And the friendships we made along the way!!
What we learned
All of us have been impacted by the cybersecurity problem space. We believed our right to a safe and secure Internet was out of our control. However, while discussing possible solutions, we came across the GDPR and CCPA laws. Amazingly, there are hundreds of tools for companies to comply but nearly none for actual users!! In 36 hours, we have come to realize the power that each user should and now CAN use. It has been a very eye opening experience to understand the extent to which companies can access our information and how little resources there are out there to help protect ourselves.
We also were able to learn a lot about the functionality of LLMs and its versatility beyond as an alternative to search. It was able to help us speed up the process of development, fix bugs, and we and more! We are very interested in learning more about this space and will definitely integrate AI tools into our workflow.
What's next for Data Guardian
We had plans for many more features that we could not fully implement during the length of LA Hacks. Here are some examples:
- Smarter Integrations with LLMs. We want to be able to take in larger inputs of text to improve our models and results. Each company can have Gigabytes of data on you and the GDPR laws are over 300 pages long, and better use of Cohere could help us better understand and solve the problem at hand.
- Analyze and target critical data. Not all data is bad for companies to collect and can make life much more convenient. We want our LLM to be able to analyze the data and determine what information needs to be wiped.
- Improved UX of App. Because Data Guardian is a very customer-facing application, there are a lot of features that we did not have the time to code which would greatly improve the user experience such as real-time analytics to see what information companies are tracking as you browse the internet. We also wish to integrate support for other browsers outside of Chrome.
- Block Collection of Personal Data. Beyond deleting data, we want to prevent companies from collecting certain data. This could potentially be solved by intercepting HTTP requests (and cookies).
Built With
- airtable
- apis
- cohere
- email-automation
- flask
- javascript
- openai
- python
- react
- regex




Log in or sign up for Devpost to join the conversation.