Inspiration
Compliance folks spend days turning dense circulars (RBI/SEBI/PCI/ISO/SOC2/GDPR) into impact notes, owners, and evidence. Existing tools are SaaS (data leaves your laptop) or opaque LLMs (hallucinations). We wanted local-first, deterministic, explainable analysis that teams can trust.
What it does
- Ingests new circulars/standards (PDF/links) and optional baselines
- Extracts obligations with a rule-first pipeline (no external APIs)
- Maps obligations to your control catalogs using local embeddings + FAISS
- Plans actions (owner, effort, due date) and evidence to collect
- Explains every match with source snippets and similarity scores
- Audits everything with a SHA-256 hash-chained log
- Exports JSON/CSV + a human-reviewable plan
How we built it
- UI: Streamlit (dark theme, multipage views)
- Parsing: PyMuPDF/pdfplumber + layout normalization + clause detectors
- Extraction: lexicon + grammar/regex cues (“shall/must/prohibit”), section heuristics
- Mapping: sentence-transformers (local) → FAISS index of PCI/RBI/etc. controls; hybrid score = BM25-style keywords + cosine sim
- Explainability: per-match snippets, scores, and thresholds (auto-accept / review)
- Auditability: append-only JSONL with SHA-256 chaining
- Storage: local JSON (SQLite in roadmap). No external LLM/API calls at runtime
Challenges we ran into
- Wild PDF layouts (tables/headers/2-column) → layout-aware preprocessor
- Over-matching on generic terms (“encryption”) → negative cues + section weighting
- Speed vs accuracy on large docs → chunking + cached embeddings
- “Zero-hallucination” UX → rules are authoritative; embeddings only suggest
Accomplishments that we’re proud of
- 100% local pipeline that works offline
- Deterministic extraction with fully traceable why for every mapping
- One-click export that reads like an auditor’s action plan
- Tamper-evident audit trail from day one
What we learned
- GRC users trust citations & snippets over scores alone
- Small local models + rules beat naive prompts for regulated text
- Good defaults (prebuilt PCI/RBI catalogs) massively cut setup friction
What’s next for RegDelta
- Interactive review (accept/override/reject) & SQLite backend
- Multi-catalog management and scenario comparison reports
- OCR fallback, golden-set metrics dashboard
- Jira/GitHub integrations, lightweight local RBAC, webhooks
Built with
- Languages/Frameworks: Python 3.10+, Streamlit
- Parsing & NLP: PyMuPDF, pdfplumber, pdfminer-six, pypdfium2, sentence-transformers, transformers, torch (CPU), FAISS, rapidfuzz, scikit-learn
- Data/Utils: pandas, numpy, pyyaml, regex, requests, hashlib (SHA-256)
- Platforms: Local (Windows/macOS/Linux), Streamlit Community Cloud
- AI assistance (development): Entire codebase authored/refactored with help from Claude Sonnet 4.5; runtime is 100% local (no external LLM/API calls)
Log in or sign up for Devpost to join the conversation.