When people signup for websites they are usually prompted to read through terms and services, privacy policy and a lot of other legal files. Most users just skip reading those entirely and so, companies can pretty much put anything they want and users will not notice it. We wanted to fix that.

What it does

Given a URL of terms and services page/privacy policy/legal page the program will crawl the web for the information on that webpage. We split each section into its own paragraph, and then feed those through to the language processors. Using a combination of IBM's Watson A.I and Python's Natural Language Tool Kit, we extract the main concept of the paragraph and then summarize the whole paragraph. We then output the result of the entire legal page into the extension which allows users to quickly see the main points of a Terms of Service or Privacy Policy.

How we built it

We built the program using Python, and the extension was written in both HTML and JavaScript. We utilized IBM Watson's natural language processing ability to extract the main concepts, and then using a few Python libraries like Requests, BeautifulSoup and NLTK, we were able to write an algorithm, which will summarize a piece of text to the best of its ability.

Challenges we ran into

Originally, we were going to let the summarization be done by another API, but after struggles with feeding it text, we decided to just write our own summarizer. We also ran into the issue of certain web pages having weird nested formats for their Terms of Service and Privacy policies, resulting in the web crawler not working for those sites.

Accomplishments that we're proud of

We were able to easily leverage IBM Watson's API as well as create a very basic text summarizer in order to accomplish everything.

What we learned

Since we had to use Natural Language Processing in order to summarize the text, we had to learn how to do so in the first place. We also learned how to use IBM Watson's API to get the main concepts as well as a bunch of other info about a specific piece of text.

What's next for SimpleTOS

In the future, we would like the whole thing to run on a web extension, be dynamic(TOS can be summarized for a website the user chooses) and also to write our own neural network in order to have a more accurate and more powerful summarizer.

Share this project: