Inspiration

We’ve all experienced it: facing a wall of legal text in Terms & Conditions and instinctively clicking “I Agree” without reading. These documents are often long, dense, and filled with legal jargon making it nearly impossible for everyday users to understand what they’re agreeing to.

In an age where online safety and data privacy are increasingly under threat, this poses a serious risk. We were inspired to build a solution that empowers users to make informed decisions by summarizing complex T&Cs into something simple, readable, and actionable.

What it does

Legal Aid is a two-step pipeline that transforms lengthy, jargon-heavy legal agreements into short, understandable summaries. Given any Terms & Conditions input, it:

  1. Extracts the text under Terms of Services and Terms and Conditions.
  2. Feeds them into a fine-tuned Llama 3.1-8b-instruct model with a propmt to extract negative implications.

The final output is a concise, readable overview of the key points helping users know what they’re agreeing to before clicking "Accept."

How we built it

We used a two-stage NLP pipeline:

  1. Extractive Summarization
    Using Selenium and webscraping, we extract sentences from the document.

  2. Abstractive Summarization
    These extracted sentences are then passed through the LLama model from Hugging Face to summarize T&Cs in a clear, user-friendly format.

Tech Stack:

  • JavaScript
  • Manifest V3 Extension
  • Hugging Face Llama 3.1-8b-instruct model
  • HTML
  • CSS
  • Flask

Challenges we ran into

  • Token Limitations: Many T&Cs exceed the input token limit for transformer models. We had to preprocess, chunk, or truncate input text while preserving meaning.
  • Legal Language Complexity: Accurately simplifying legal text without losing critical meaning was a fine balance.
  • Evaluation: Since T&Cs are often vague or open to interpretation, evaluating the quality of summaries was difficult.
  • Latency: The summarization process can take time especially with large inputs or API latency from hosted models.

Accomplishments that we're proud of

  • Successfully combined extractive and abstractive summarization into a working pipeline.
  • Leveraged a real-world model model to ensure practical, ethical output.
  • Created summaries that retained meaning, clarity, and privacy-relevant content from real T&Cs.
  • Helped make legal documents accessible to non-experts promoting digital literacy and safety.

What we learned

  • Hands-on experience with natural language processing pipelines combining multiple summarization strategies.
  • The real-world limitations of deploying transformer models on large documents.
  • The importance of ethical AI when dealing with privacy, legal, and user-facing applications.
  • Strategies for handling legal text, which is often intentionally vague or verbose.

What's next for T&C Summarizer

  • Build a browser extension or web app for real-time summarization on websites.
  • Support multi-language T&Cs for broader accessibility.
  • Add voice summaries for audio accessibility.
  • Integrate with cybersecurity tools to flag risky or unusual clauses.
  • Experiment with newer transformer models like Longformer or GPT-4-turbo for longer document support.

Built With

Share this project:

Updates