LexiGuard

Inspiration

We’ve all clicked "I Agree" without reading the fine print. Companies know this, hiding aggressive data policies behind dense "legalese" that acts as a barrier to understanding. We realized that simply summarizing a document isn't enough because summaries often miss the nuance. We wanted to build LexiGuard to act as a decoder ring—specifically designed to hunt down the "Hidden Keywords" (sneaky legal terms like indemnification, binding arbitration, and perpetual license) that companies use to trick users, and translate them into human-readable language.

What it does

LexiGuard is an AI Legal Agent that translates "Legalese" into "Human":

Finds the Hidden Keywords: It scans documents for specific trigger words—legal traps that are often buried in walls of text.

Contextual Analysis: It doesn't just spot the keyword; it checks how it's used. (e.g., Is "data sharing" for shipping a package? Or for selling to advertisers?)

Human-Readable Translation: It rewrites these complex clauses into 8th-grade English. Instead of saying "User indemnifies platform," it says "You have to pay their legal bills if they get sued."

Compliance Scoring: It cross-references these keywords against a DigitalOcean-hosted Knowledge Base of GDPR and CCPA laws to assign a simple 0-10 safety score.

How we built it

We utilized the DigitalOcean GenAI Platform to create a specialized RAG (Retrieval-Augmented Generation) pipeline:

The Keyword Detector: We engineered a System Prompt that prioritizes a list of 50+ "high-risk" legal keywords (e.g., waiver, third-party, affiliates).

The Knowledge Base: We uploaded a custom legal_definitions.md file to the DigitalOcean agent. This acts as a dictionary, teaching the AI exactly how to translate specific hidden keywords into plain English without hallucinating.

The Model: We used GPT-oss-120b, instructing it to act as a "Translator" rather than a lawyer, ensuring the output is always simple and conversational.

Challenges we ran into

False Positives: Initially, the AI flagged every mention of "data" as bad. We had to refine our "Hidden Keyword" logic to differentiate between functional data usage (good) and commercial data selling (bad).

Simplification vs. Accuracy:

It was hard to make the AI sound "human" without losing legal accuracy. We solved this by implementing a "Two-Step" prompt: first, extract the legal fact; second, rewrite it for a 12-year-old.

Hallucinations: To stop the AI from inventing fake laws, we restricted its answers strictly to the provided Knowledge Base documents. Can be found in the attached documents.

Accomplishments that we're proud of

The "Plain English" Engine: We successfully tuned the agent to take a 500-word liability clause and turn it into a single, understandable sentence: "If you break it, you buy it."

Keyword Extraction: The agent accurately identifies 95% of hidden predatory clauses in our test set of standard EULAs.

Speed: The entire analysis happens in under 5 seconds, making it faster than skimming the first paragraph yourself.

What we learned

Language is a Barrier: The biggest issue with modern tech isn't the technology; it's the language used in contracts. AI is the perfect tool to bridge that gap.

RAG is Essential: You cannot rely on a model's general knowledge for law. Injecting specific definitions for "Hidden Keywords" was crucial for consistent results.

What's next for LexiGuard

Browser Extension: A popup that automatically highlights "Hidden Keywords" in red as you scroll through a webpage.

Multi-Language Support: Translating English legalese into plain Spanish, French, and German to help international users.

"Fix It" Button: An AI agent that not only finds the bad keywords but automatically drafts an email to the company asking to opt-out of those specific terms.

Built With

Updates

venkata sai Dasari started this project — Dec 13, 2025 02:30 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.