LGTM: Legal Governance & Trust Monitor

Inspiration

In the fast-paced world of modern software development, legal compliance is often an afterthought—until it's too late. We've seen brilliant projects derailed by accidental "License Bombs" (like a GPL-3.0 header in a proprietary repo) or unintended PII leaks. We were inspired to build a tool that makes legal safety as automatic as a unit test. We wanted to transform "LGTM" from a casual "Looks Good To Me" into a verified, data-driven "Legal Governance & Trust Monitor."

What it does

LGTM is an autonomous AI Auditor that lives in your GitLab environment. Every time a Merge Request opens, the LGTM agent springs into action:

  • Scans Diffs: Intelligently filters out binary noise to find source code changes.
  • Identifies Risks: Detects copyleft licenses, hardcoded secrets, and PII using Context-Injected RAG.
  • Rule-Cite Library: Cites a versioned library of 9 professional legal briefs (RC-01 to RC-09) to ground its advice in actual law, not hallucinations.
  • Calculates Risk Scores: Uses a weighted probability model to determine a final compliance score.
  • Autonomous Remediation: When high-risk violations are found, LGTM doesn't just flag them—it opens a "Remediation Proposal" MR suggesting a compliant code swap.
  • Legal Memory (Audit Trail): Persists every decision as a signed, queryable record in .lgtm/records/, ensuring a legally defensible audit trail for every release.

How we built it

We built LGTM using a modern, agentic stack:

  • AI Brain: Gemini 2.5 Flash provided the high-speed reasoning required for complex legal interpretation.
  • Dual-Retrieval RAG: We implemented a custom RAG architecture that retrieves both Public Briefs (Rule-Cite) and Project Precedents (Legal Memory) before delivering counsel.
  • GitLab Integration: Built a robust REST API toolset for direct interaction with GitLab Merge Requests, Diffs, and Issues.
  • Smart Filtering: Implemented a keyword-based priority scan to handle large repositories with 600+ file changes.
  • Math Modeling: Our Risk Score ($R$) is calculated based on weighted categories: $$R = \min(100, \sum_{i=1}^{n} w_i \cdot c_i)$$ where $w_i$ is the severity weight and $c_i$ is the confidence of detection.

Challenges we ran into

The biggest technical hurdle was the "GitLab Access Barrier." We initially planned to use a beta MCP endpoint, but encountered persistent 403 Forbidden errors. Instead of giving up, we pivoted in real-time to build a comprehensive direct-API fallback that ended up being faster and more flexible. We also faced the "Noise Problem"—GitLab Merge Requests in real-world repos are often cluttered with IDE metadata and binary artifacts. We solved this by building a multi-stage filtering pipeline that allows the AI to ignore the "garbage" and focus on the code that matters.

Accomplishments that we're proud of

  • Real-time Pivot: Turning a blocked integration into a working, custom REST toolset in just a few hours.
  • Accuracy: The agent doesn't just find keywords; it understands the implications of a license, such as the copyleft risks of GPL-3.0.
  • Defensibility: Every report includes a "signed" reasoning record, turning a technical lint check into a legally defensible audit record.
  • Seamless Integration: The reports look like they were written by a human legal expert, yet they appear seconds after a push.

What we learned

We learned that context is king. An AI that sees every file in a computer is overwhelmed; an AI that sees only the right files is a genius. Refining our filtering logic taught us how to optimize LLM context windows for maximum signal-to-noise ratio. We also learned that the most valuable AI tools are the ones that work within existing developer workflows (like the MR comment section) rather than forcing them into a new dashboard.

What's next for LGTM (Legal Governance & Trust Monitor)

  • Multi-Cloud Deployment: Moving from local execution to a fully managed Google Cloud Run webhook service.
  • Semantic License Search: Moving beyond keywords to identify "copied-from-Stack-Overflow" code that might carry hidden license obligations.
  • Interactive Remediation: Allowing developers to chat with the LGTM agent directly in the MR comments to ask for advice on how to fix a violation.
  • Enterprise Dashboard: A centralized view for Legal teams to monitor compliance across thousands of repositories in real-time.

LGTM: Making legal compliance as simple as a git push.

Built With

  • docker
  • dotenv
  • fastapi
  • firestore-scaffolded
  • gemini-2.5-flash
  • gitlab-rest-api
  • googe-cloud-run
  • google-cloud
  • google-genai-sdk
  • httpx
  • javascript
  • python
  • rag
  • react+vite-dashboard
  • vertex-ai
  • yaml
Share this project:

Updates