RegDelta

Inspiration

Compliance folks spend days turning dense circulars (RBI/SEBI/PCI/ISO/SOC2/GDPR) into impact notes, owners, and evidence. Existing tools are SaaS (data leaves your laptop) or opaque LLMs (hallucinations). We wanted local-first, deterministic, explainable analysis that teams can trust.

What it does

Ingests new circulars/standards (PDF/links) and optional baselines
Extracts obligations with a rule-first pipeline (no external APIs)
Maps obligations to your control catalogs using local embeddings + FAISS
Plans actions (owner, effort, due date) and evidence to collect
Explains every match with source snippets and similarity scores
Audits everything with a SHA-256 hash-chained log
Exports JSON/CSV + a human-reviewable plan

How we built it

UI: Streamlit (dark theme, multipage views)
Parsing: PyMuPDF/pdfplumber + layout normalization + clause detectors
Extraction: lexicon + grammar/regex cues (“shall/must/prohibit”), section heuristics
Mapping: sentence-transformers (local) → FAISS index of PCI/RBI/etc. controls; hybrid score = BM25-style keywords + cosine sim
Explainability: per-match snippets, scores, and thresholds (auto-accept / review)
Auditability: append-only JSONL with SHA-256 chaining
Storage: local JSON (SQLite in roadmap). No external LLM/API calls at runtime

Challenges we ran into

Wild PDF layouts (tables/headers/2-column) → layout-aware preprocessor
Over-matching on generic terms (“encryption”) → negative cues + section weighting
Speed vs accuracy on large docs → chunking + cached embeddings
“Zero-hallucination” UX → rules are authoritative; embeddings only suggest

Accomplishments that we’re proud of

100% local pipeline that works offline
Deterministic extraction with fully traceable why for every mapping
One-click export that reads like an auditor’s action plan
Tamper-evident audit trail from day one

What we learned

GRC users trust citations & snippets over scores alone
Small local models + rules beat naive prompts for regulated text
Good defaults (prebuilt PCI/RBI catalogs) massively cut setup friction

What’s next for RegDelta

Interactive review (accept/override/reject) & SQLite backend
Multi-catalog management and scenario comparison reports
OCR fallback, golden-set metrics dashboard
Jira/GitHub integrations, lightweight local RBAC, webhooks

Built with

Languages/Frameworks: Python 3.10+, Streamlit
Parsing & NLP: PyMuPDF, pdfplumber, pdfminer-six, pypdfium2, sentence-transformers, transformers, torch (CPU), FAISS, rapidfuzz, scikit-learn
Data/Utils: pandas, numpy, pyyaml, regex, requests, hashlib (SHA-256)
Platforms: Local (Windows/macOS/Linux), Streamlit Community Cloud
AI assistance (development): Entire codebase authored/refactored with help from Claude Sonnet 4.5; runtime is 100% local (no external LLM/API calls)

Built With

claude
faiss
hashlib
numpy
pandas
pdfminer
pymupdf
python
rapidfuzz
sentence-transformers
streamlit
transformers

Submitted to

Accel + Anthropic | Dev Day Community Showcase

Created by

I conceived, architected, and implemented RegDelta end-to-end: PDF ingestion/diffing, rule-based obligation extraction, local embeddings + FAISS control mapping, explainability (snippets/scores), and a SHA-256 hash-chained audit log. I built the Streamlit UI, prepared PCI/RBI catalogs, wrote tests/docs, set up Streamlit Cloud deployment, and handled licensing. Claude Sonnet 4.5 assisted coding; the app itself runs 100% locally with no external LLM/API calls.

preetham shyam

Updates

preetham shyam started this project — Oct 07, 2025 06:58 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.