The term "Privacy Paradox" describes the inconsistency between our concerns about privacy and our seemingly apathetic behavior towards giving information away. Oftentimes, corporations encourage apathy with convoluted Privacy Policies that may hide content people are likely to find concerning. The Privacy Policy Index (PPI) aims to provide a convenient and accessible starting point for looking into privacy policies.

What it does

PPI first fetches the privacy policy from the URL link that the user submitted to the program and run it through a series of Flesch–Kincaid readability tests to find out about the readability of the policy, then pass it through a regression model trained specifically for the purpose of evaluating the content of privacy policies to predict the quality of the policy on a scale from 0 to 5

PPI uses Machine Learning, Linear Regression, and several readability indexes to evaluate the strength of a company's privacy policy and rank it against that of other companies.

How we built it

We used a variety of web scraping tools for the policy scraping part of the program, and we used a linear regression model for the policy content evaluation by machine learning

In terms of programming languages, we used python for the entirety of the privacy policy evaluation program in the backend, and we used React complemented by a Javascript library called Ant Design to produce the front-end part of the website.

Challenges we ran into

The most difficult challenges we ran into was figuring out the proper web scraping and the machine learning algorithm appropriate for this project

Share this project: