RegRadar

Video Demo Link : https://www.loom.com/share/3264a597a9d84aefaa50b3c9adf39324

Inspiration

Nine million hogs in two North Carolina counties. More animal waste than the entire state's human population. The families living next to these farms, predominantly low-income and minority, are breathing contaminated air and watching their waterways turn toxic. The regulations meant to protect them exist, but they're buried in thousands of pages of federal legal text that nobody has time to read.

NC DEQ has roughly 50 inspectors for thousands of permitted facilities. Farm operators risk six-figure EPA fines for violations they didn't even know about. Community advocates have no way to check if the farm next door is actually following the rules. The data is public. The regulations are public. But the gap between "publicly available" and "actually usable" is massive. We built RegRadar to close it.

What it does

RegRadar takes thousand-page federal regulations and turns them into instant, facility-specific compliance checks. Pick a farm. Pick a regulation. Get a plain-language report in under one second telling you exactly what's wrong, what section of the law requires it, and what to do about it.

We monitor 15 real NC hog farms against 9 EPA regulations. Our AI extracted 76 specific legal requirements from the core CAFO rule alone, each with actual CFR section citations, not generic advice. One farm came back non-compliant with 5 gaps. Fourteen came back with warnings. All actionable.

The same approach works for any regulated industry: mining, oil and gas, food processing, construction. Anywhere dense federal rules meet understaffed regulators and at-risk communities, RegRadar can help.

How we built it

We built a retrieval-augmented generation (RAG) pipeline grounded in real regulatory text. The backend is Python with FastAPI, SQLite for structured data, and ChromaDB as our vector store. We chunk each regulation into 800-token segments with 150-token overlap, then embed them using a sentence-transformer model (all-MiniLM-L6-v2, 384 dimensions) for cosine similarity search across 4,700+ indexed chunks. When a compliance check runs, we retrieve the most relevant chunks using metadata-filtered vector search before passing them to the LLM.

Google Gemini 2.5 Flash powers a two-pass analysis pipeline with structured JSON output via Pydantic response schemas. Pass one runs requirement extraction over the full regulation text, producing typed ComplianceRequirement objects with obligation classification (mandatory, recommended, conditional). Pass two takes those extracted requirements plus the facility profile and runs facility-specific gap analysis at temperature 0.2 to minimize hallucination. Both passes use grounded prompting where instructions follow context, not the other way around.

The frontend is Next.js 14 with an interactive Leaflet map, Recharts visualizations, severity-ranked alerts, and a global search palette. Every response is pre-cached in a three-tier system (file cache, SQLite cache, live Gemini) so the demo runs at sub-10ms latency.

Real data throughout. Real farms from NC DEQ permits. Real regulations from the Federal Register API. Zero mock data.

Challenges we ran into

Our first approach sent everything to Gemini in one prompt. The output was vague and generic. Splitting into two passes (extract requirements, then analyze the facility) took us from 5 vague bullets to 76 specific, citation-backed requirements. Architecture beat prompt engineering.

Getting the transformer embeddings right mattered more than we expected. Early keyword searches missed semantically relevant regulation sections. Switching to dense vector retrieval with metadata filtering (scoping to the target regulation's doc_id before running cosine similarity) eliminated false matches from unrelated rules and dramatically improved the context quality feeding into Gemini.

Each Gemini call takes 3 to 5 minutes. Fifteen facilities meant over an hour of API calls. We built a batch caching pipeline that pre-generates every result, so the demo responds instantly while still being reproducible with live Gemini.

Accomplishments that we're proud of

76 legal requirements extracted from one regulation with zero hallucinated section numbers. Sub-second compliance checks for all 15 facilities. A non-compliant finding for Young Nursery that a real regulator could act on today. A RAG pipeline that retrieves genuinely relevant regulatory context instead of keyword noise. And the entire system is built to be replicated: swap the seed data and prompts, and RegRadar becomes a compliance engine for any federally regulated industry.

What we learned

Prompt architecture matters more than prompt wording. Two passes beats one, every time. Caching isn't optimization, it's a product feature. Vector retrieval with metadata pre-filtering outperforms naive semantic search when your corpus contains multiple documents on similar topics. And domain specificity is everything in legal AI: telling the model about a specific farm's herd size and permit type produces dramatically better results than asking it to "check compliance" generically.

What's next for RegRadar

Scale from 15 farms to all 2,307 NC CAFOs. Ingest the state-level permit (AWG100000) and NC administrative code alongside federal rules. Add real-time Federal Register monitoring so new regulations trigger alerts automatically. Fine-tune a domain-specific embedding model on environmental regulatory text to improve retrieval precision beyond what general-purpose sentence-transformers provide. Build a public community dashboard so residents in affected counties can see compliance data for farms near their homes.

And then take it beyond agriculture. Mining operations in Appalachia, refineries along the Gulf Coast, food processing plants in the Midwest. Anywhere communities need transparency and regulators need scale, RegRadar can be there.