Inspiration

  • Most of us students are unaware of our rights and don’t realize when they are violated. Recently with a lot of companies going bankrupt and choosing to not return user assets has raised a lot of awareness around privacy policy. Moreover, us being students most of us also experienced difficulty with our landlords and we could be taken advantage of very easily especially when first looking for housing. We wanted to build an application that would not only protect us, but also educate us on our rights and guide us. This will prevent landlords and well as large corporations from taking advantage of students and people who like to sign without reading.

  • Soteria aims to solve this issue by providing tools for checking rental agreements and analyzing the privacy policy of our most used services.

What it does

  • This project has two parts - a rental contract checker and a privacy policy analyzer.

  • The rental contract checker will accept a PDF document and tell users if there are missing fields or any breaches of the law in the contract; this will help protect tenants’ rights. In this hackathon, we based our laws off of British Columbia and our algorithm will be analyzing BC Residential Tenancy Act to determine if any clauses violate it. (https://www.bclaws.gov.bc.ca/civix/document/id/complete/statreg/02078_01)

  • The privacy policy analyzer can accept the plain text as well as URLs as input. It will then run the text through our machine learning model and tell the user if each sentence is positive or negative to them, using sentiment analysis. This can help the user to understand a website’s privacy policy better. Data Set has been trained by reading through 30 Companies Privacy policy and identifying "positive", "negative" and "neutral" claims.

  • The user can also sign in using the Deso blockchain to add any services they use to a wishlist. The user will be notified whenever there is a change in the Company's privacy policy that they subscribed to. The user will be able to view the privacy policy score of the service and a log of the changes from the last update.

How we built it

Github

  • Throughout the development of this project, heavy usage of Github features have been implemented including:
  • Commits, Pull Requests, Issues, Labels, Project Boards, Organizations, Github Secrets.
  • Github Actions for Continuous Integration and Vercel for Continuous Deployment.

Front-End

  • The front end is built using the Next.JS framework, ReactJS and Typescript.
  • We used Deso authentication to sign into the application and access the personalized wishlist
  • For the swift designing process, we used Tailwind and Figma for basic prototyping.

Back-End

  • We used TensorFlow to detect if a sentence in the privacy policy is positive or negative (or neutral) for the user.
  • We analyze the PDF of the rental agreement using python and detect the missing keywords in the rental contract
  • All these are packed with FastAPI to open endpoints for Next.js
  • Heavy usage of GitHub CodeSpace to code and as a temporary API server.

Challenges we ran into

  1. Machine Learning models cannot improve their accuracy after reaching a balance point. We used smaller batches sizes and then changed how we split the sentences then, which improved their accuracy a lot.
  2. Having four people work vigorously on one project meant that we had to be very careful about version control. There were some merge conflicts that were challenging to fix.
  3. Deso was hard to set up, and there wasn’t enough documentation online for the user profile section.
  4. Building the code for production was another hassle. Because of the hackathon environment, everyone in the team was in a rush and was pushing not-linted or ones with small/medium bugs. Fortunately our CI/CD was able to capture and let us know. Looking back it helped us keep our codebase clean and prevent future disasters.

Accomplishments that we're proud of

  1. We built a Natural Language Processing ML model to analyze the semantics of sentences, which is challenging.
  2. We have really organized the GitHub repo, including branch management and issue boards.
  3. The UI we made is clean and easy to use for the user using Next.JS

What we learned

  1. This project gave us a better understanding of an NLP model, including how to build one, debug it, and the different layers used in NLP models.
  2. We learned how to deal with PDF files in Python, mainly how to read the text from a pdf file
  3. Due to our heavy use of Git, we learned how to do source control.
  4. Sleep is important.

What's next for SOTERIA

We will improve the accuracy of the NLP models with more data and better network structure. We were going to add a timeline to the privacy policy to track the changes in a company’s privacy policy and notify the user when it is changed. (Because many companies won’t tell the user their changes in the privacy policy). We will be implementing Twilio integration in the future. We are planning to expand our rental agreement checker beyond Canada as well as analyzing any type of document.

Built With

Share this project:

Updates