About us
We are a team of 2 data scientists and a practicing attorney. Link to presentation: https://docs.google.com/presentation/d/1onxPiW9TT-d9QHdwdHdytIFVjiCk2V-s/edit?usp=sharing&ouid=101174381768000730185&rtpof=true&sd=true
Inspiration
Consumers are often forced to accept legal contracts to use a service that are unreasonably time consuming to read over and difficult to understand for the average person (privacy policies, terms of use, etc.). Think about what happens when you sign up for any service/app like Facebook, Gmail, Instagram, etc. Furthermore, consumers can be exposed to fraudulent practices such as dark patterns (https://www.ftc.gov/news-events/news/press-releases/2022/09/ftc-report-shows-rise-sophisticated-dark-patterns-designed-trick-trap-consumers). Most of the time, they don’t really have a choice; either you accept the terms, or you can’t use the service. This is especially problematic in markets where a few large companies hold a lot of market share because consumers have few options to choose from and companies can leverage this to extract intrusive information from consumers. The source of this problem is information asymmetry/disparity between consumers, corporate app providers, and regulators. Companies may be accused of unfair practices and there were class action court cases arising from unfair data collection practices. If we could bridge the informational gap by promoting transparency with LLM technology, consumers can be more informed about how their data may be used, and companies can mitigate litigation and reputation risks.
What it does
Our product automatically parses a legal contract, compares it with similar contracts or agreements and analyses on whether the contract is outside of industry norms. Consumers will receive prompts with color coded threat/risk levels (red or yellow flag for instance) and key term suggestions if it detects: unreasonable or overly broad disclaimers, provisions that incur unfair liability or responsibility to consumer, predatory clauses regarding ownership of IP, privacy/data risks in terms of overly broad data collection, storage, transfer, 3rd party usage language that deviates from good industrial practice. In addition, it also provides users with the option to record a statement of objection stored in a public database that can be used as evidence in class action, antitrust action or regulatory actions if people want to file lawsuits against companies in the future. On the contrary, if the consumer chooses not to record a statement of objection, the company will have a stronger case when dealing with related class action lawsuits. Our product will also provide an automated renegotiating and contract amendment process so that consumers can actually negotiate and come out with a better contractual term with the app provider, on the basis of said public platform. This helps companies gauge public opinion on their policies so they can improve then in the future.
How we built it
Key techs: Langchain, streamlit, vector embeddings, cluster analysis Privacy agreements are the most readily available contracts on the internet.
- We obtained around 100 privacy agreements (credits: https://www.usableprivacy.org/data)
- We use GPT 3.5 to create document embeddings and persist on disk
- We engineer prompt to query the vector database certain aspects of the agreement, e.g. type of data collected, data retention policies, etc
- We parse the query result with GPT, and obtain embeddings representing each item of data collection practices in the agreement
- We perform dimensionality reduction (UMAP) and cluster analysis (HDBSCAN) to decide what data collection practices are the industry outliers
- When user provides their own contract to the app, we execute the same steps and evaluate if any aspects of the given contract are considered outliers.
Challenges we ran into
Our team lacked front-end development experience, but fortunately we were able to use python-only GUI frameworks like streamlit. Our clustering model can still be fine-tuned to identify the most acutely irregular clauses. We may be able to improve performance by adjusting the clustering algorithm, as well as incorporating expert annotations in the training process.
Accomplishments that we're proud of
Got a simple UI running that enables users to analyze legal contracts. It is able to highlight important points that consumers should pay attention to in a privacy agreement.
What we learned
Langchain and OpenAI LLM models
What's next for Contract Busters
- Other types on legal documents
- Improve model performance
- Implement contract renegotiation process
Built With
- langchain
- openai
- python
- streamlit
Log in or sign up for Devpost to join the conversation.