No one wants to read a Terms Of Service (TOS), particularly a long one. However, Terms Of Service can hold a lot of information that it is very important to know, such as how sites are tracking you or managing your data. TOS;DR already exists, but its categorization is both manual and tends to have a slow turnaround time; therefore, if you encounter an uncategorized site, you're out of luck, at least for a while. AutoTOS, on the other hand, automatically picks out the most important bits and presents them to you on any TOS in an easy-to-read format.
What it does
Upon adding the text of a TOS onto the website, we use a NLP model to determine the important parts of the TOS, list them, and weight them to come up with a score of how "fair' the TOS treats the user, as well as a more granular view of what the TOS says that still focuses on the important parts for users.
How we built it
AutoTOS is built upon a custom natural language processing model trained with data taken from TOS;DR, a database of labeled excerpts from TOS documents. We achieved 90% average precision across over 15 annotation types and trained the model with the help of Google Cloud’s AI Platform using TensorFlow and the RoBERTa natural language processing model. Google Cloud’s AI Platform’s beta Custom Prediction service acts as the connection between our model and the frontend--we’re able to connect custom tokenization, prediction, and sentiment analysis functions to user input with a simple REST API call.
Challenges we ran into
Properly tokenizing data from TOS;DR for training data was initially difficult; we had to go through several methods to get good results from NLP model training. At the very end, after we'd trained the model, we also ran into some difficulty actually hosting the API.
What's next for Auto TOS
An extension! Right now we only have a website, but that's slightly inconvenient, unlike an extension. Plus, we'd like to have the ability to cache TOSes and the ability to get TOSes from a root site for greater efficiency and user convenience.
As AutoTOS collects more data from its users, we'll also be able to further improve to accuracy of our detections and even add new types of annotations in a form of active learning.