AutoTOS

Inspiration

No one wants to read a Terms Of Service (TOS), particularly a long one. However, Terms Of Service can hold a lot of information that it is very important to know, such as how sites are tracking you or managing your data. TOS;DR already exists, but its categorization is both manual and tends to have a slow turnaround time; therefore, if you encounter an uncategorized site, you're out of luck, at least for a while. AutoTOS, on the other hand, automatically picks out the most important bits and presents them to you on any TOS in an easy-to-read format.

What it does

Upon adding the text of a TOS onto the website, we use a NLP model to determine the important parts of the TOS, list them, and weight them to come up with a score of how "fair' the TOS treats the user, as well as a more granular view of what the TOS says that still focuses on the important parts for users.

How we built it

AutoTOS is built upon a custom natural language processing model trained with data taken from TOS;DR, a database of labeled excerpts from TOS documents. We achieved 90% average precision across over 15 annotation types and trained the model with the help of Google Cloud’s AI Platform using TensorFlow and the RoBERTa natural language processing model. Google Cloud’s AI Platform’s beta Custom Prediction service acts as the connection between our model and the frontend--we’re able to connect custom tokenization, prediction, and sentiment analysis functions to user input with a simple REST API call.

Challenges we ran into

Properly tokenizing data from TOS;DR for training data was initially difficult; we had to go through several methods to get good results from NLP model training. At the very end, after we'd trained the model, we also ran into some difficulty actually hosting the API.

What's next for Auto TOS

An extension! Right now we only have a website, but that's slightly inconvenient, unlike an extension. Plus, we'd like to have the ability to cache TOSes and the ability to get TOSes from a root site for greater efficiency and user convenience.

As AutoTOS collects more data from its users, we'll also be able to further improve to accuracy of our detections and even add new types of annotations in a form of active learning.

Built With

docker
finetune
flask
google-cloud-ai-platform
google-compute-engine
javascript
python
tensorflow

Submitted to

PennApps XXI
- Winner Best Use of Google Cloud

Created by

I used web scraping to download and preprocess training data, and I set up the Google Cloud services for training and hosting the ML model online.

Andrew Mascillaro
ECE Major at Olin College of Engineering
I trained and tested the custom NLP model and helped implement data cleaning and prediction methods in the backend API

Spencer Ng
CS @ UChicago
William Qin
Eric Zheng
CMU '23

Updates

Andrew Mascillaro posted an update — Sep 29, 2020 11:53 AM EDT

For greater flexibility in using our product, we will be migrating from Indico's finetune library to huggingface for our natural language processing. Stay tuned to learn more about the development and deployment of AutoTOS!

Log in or sign up for Devpost to join the conversation.

William Qin started this project — Sep 12, 2020 02:20 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.