Inspiration
The intersections of technology, data, and privacy are growing hazier as technology develops. At the same time, very few people have the time to read regulatory documents for social media or website privacy agreements that claim to protect them. We wanted to zero in on what folks were skipping over when they clicked accepted user agreements without reading them, so they can make informed choices about who they share their data with and what happens to it.
What it does
A gated recurrent neural network processes privacy agreements
How we built it
1.) Scraped through privacy policy XML files in open dataset, "ACL/COLING 2014 Dataset," provided by https://usableprivacy.org/data 2.) Came up with 6 categories that we thought would be most pertinent to data safety and privacy for users 3.) Carefully read small sample of privacy policies and tagged each policy with a binary code describing whether or not they used data in a certain way. 4.) Chose a gated recurrent neural network to develop a training/test dataset 5.) Trained 80% our dataset and reported an array of their probabilities for our 6 binary variables
## Challenges we ran into 1.) Small team and limitation on time affected how many privacy policies we could read through for our training dataset
## Accomplishments that we're proud of 1.) We're really proud of the way we formed our idea. Instead of just looking for an interesting dataset, we chose to examine a pain point. 2.) We created our own dataset. 3.) We created a working! machine-learning model (and it was each of our first times doing so).
## What we learned 1.) Specific methods of machine-learning, such as neural networks, engram models, and natural language processing 2.) Caroline learned Python!
## What's next for Predicting Data Violation with Privacy Policies 1.) Expanding dataset 2.) Chrome extension that debriefs you on the way your data is used in a given website

Log in or sign up for Devpost to join the conversation.