Inspiration

A friend called me at 2 AM because her neighbor — a single mom, no English, no lawyer — had an eviction notice taped to her door with a three-day deadline. We spent hours Googling statutes, trying to figure out if the notice was even legal. It wasn't. The landlord had skipped the 30-day cure period required by state law. But by the time we figured that out, it was dawn.

I kept thinking: an LLM could have extracted every entity from that notice in seconds. But a chatbot wouldn't know the statute was violated — it would just summarize the document and say "consult a lawyer." What was missing wasn't intelligence. It was reasoning infrastructure: a system that could build a formal model of the situation, validate it against hard constraints, and tell you what's actually wrong and what to do about it.

That's what I built.

What Unsprawl does

Unsprawl is a platform where a single JSON manifest turns Gemini into an autonomous crisis response system. You define entity schemas, inference rules, safety vetoes, and intervention strategies in a manifest file — no code. The platform handles the rest.

When media hits the system — a scanned eviction notice, a voicemail, a cell phone video of an ICE warrant — an intake router classifies it by MIME type, deduplicates by content hash, and dispatches it to the right extraction path. Five engines coordinate over a NATS JetStream bus: Gemini extracts structured entities, an RDF knowledge graph runs OWL-RL inference and W3C SHACL shape validation to detect missing fields, Z3 validates constraints, a sentinel engine vetoes unsafe outputs before they reach the user, and a solver generates intervention strategies grounded in the knowledge graph. Twelve content analyzers run in parallel on every input — deepfake detection, SynthID watermark scanning, C2PA provenance verification, AI-generated text detection — because when someone hands you an eviction notice at 2 AM, you need to know if it's real. If the system needs a tool it doesn't have, it forges a new MCP server, sandboxes it, feeds errors back to Gemini for self-repair, and promotes it to production.

When the knowledge graph has gaps — a missing case number, an unconfirmed hearing date — a slot analyzer prioritizes what's needed and the voice agent asks for it directly, using confidence-gated confirmation so uncertain fields get explicit verification while high-confidence ones flow through. The entire provenance chain is tracked in W3C PROV-O: which source produced which fact, at what confidence, through which extraction activity.

The same engine runs nine missions. Each is a JSON manifest. Each exists because someone, somewhere, is in crisis and the systems that should help them are too slow, too siloed, or don't exist.

Aegis — Eviction defense. The one that started it all — that 2 AM phone call from the Inspiration section. I kept thinking about how many people get those notices every night and just comply because they don't know the notice itself is illegal. It was the first manifest I wrote.

Amber — Missing child response. I read about Asha Degree — missing since 2000, walked out of her house at 3 AM in a storm, still gone twenty-five years later. The first 48 hours determine almost everything, and most of that window burns on coordination overhead between agencies that don't talk to each other.

Chronicle — Endangered language preservation. I watched a segment about Marie Smith Jones, the last speaker of Eyak. When she died in 2008, a language died with her. There are dozens of languages right now down to a single living speaker. Chronicle is the only mission where the clock isn't legal. It's biological.

Compass — Foster youth transition support. Texas Standard ran a series called "Unidentified" about foster youth who age out of care without documents. One woman, going by "Gypsy" because she isn't sure what her legal name is, has been trying to get an ID for a decade. No birth certificate, no social security card. You can't get an ID without a birth certificate. You can't get a birth certificate without an ID. Twenty thousand kids hit this wall every year. Compass breaks the loop.

Nightshift — Education and family law defense. In New Jersey, a 7-year-old was suspended for drawing a picture of himself holding a gun. In Arizona, a 13-year-old was suspended for doodling a stick figure with a gun — covered in smiley faces, no blood, no target. Zero-tolerance policies turn crayon drawings into disciplinary records that follow kids into CPS referrals, custody hearings, and their parents' professional licensing reviews. One institutional overreaction cascades into family destruction. Nightshift maps the cascade.

Sanctuary — Immigration defense. A DACA recipient named Lisbeth followed USCIS's own video instructions for her renewal fee. The video said $465. The actual fee was $495. USCIS rejected her application. She resubmitted with the correct amount — and got a second rejection saying they were no longer accepting renewals. USCIS's own mistake cost her legal status. Sanctuary detects administrative errors before they become deportation orders.

Shield — ICE raid response. In Nashville in 2019, ICE agents blocked a father and his 12-year-old son in their van outside their home. The agents had an administrative warrant — not a judicial one — which meant they had no legal authority to force entry into the vehicle. The family didn't know that. They sat in the van for four hours until neighbors formed a human chain so they could run inside their house. Shield makes sure people know a closed door is everything before agents are at it.

Unslop — Bug bounty report verification. In January 2026, Daniel Stenberg shut down curl's bug bounty program. Not because of budget — because the flood of AI-generated garbage reports made it impossible for his team to find real threats buried underneath. Hallucinated CVEs, fabricated reproduction steps, code paths that don't exist. The irony of using AI to clean up AI's mess isn't lost on me.

Watchlist — Gang database civil rights defense. In 2025, a Venezuelan barber named Franco Caraballo was detained by ICE because of a tattoo on his arm. It was a clock marking his daughter's birth time. No criminal record in the US or Venezuela. He was processed for deportation to El Salvador — a country he isn't from. A 2016 audit of California's CalGang database found entries for 42 people who were less than one year old. Watchlist files FOIA requests to expose the evidence and reframes false entries for asylum defense.

How we built Unsprawl

The platform is ~108,000 lines of Python with a React frontend and a Next.js voice frontend, all running in a Docker stack with 18 services. Gemini 3 is central to everything — Flash Preview for entity extraction and narration, Pro Preview for strategy generation, deep-research-pro-preview for async long-running investigation, and gemini-embedding-001 for cross-session memory similarity.

The knowledge graph is RDFlib with OWL-RL forward-chaining inference (the real owlrl library, not a simplified version), W3C SHACL shape validation for schema-driven gap detection, and Z3 constraint solving for fact validation. When the graph is incomplete, a slot analyzer ranks missing fields by legal urgency and feeds them to the voice agent or a dynamically generated intake form. Engines communicate exclusively through typed Pydantic messages over NATS JetStream — no engine imports another. A choreography-based saga tracker with W3C traceparent propagation follows requests across engine boundaries, with stuck-pipeline detection, retry storm detection, and dispute loop detection running as a background watchdog.

The intake pipeline handles more than documents. Video goes through histogram-based shot boundary detection with entropy-adaptive keyframe sampling. Audio goes through faster-whisper transcription. Images and scanned pages go through OpenCV document detection, perspective dewarping, and Gemini 2.5 Flash structured OCR. Every extraction activity is recorded as W3C PROV-O provenance — which source produced which fact, at what confidence, through which pipeline stage.

The voice pipeline uses LiveKit for WebRTC, faster-whisper for STT, and five TTS backends — edge-tts, Dia, Chatterbox, Qwen3, and Orpheus — selected per mission based on voice profile requirements. Eight conversation optimization modules run in parallel: backchannel generation, emotion tracking, turn-taking prediction, disfluency injection for natural speech rhythm, and adaptive timing that adjusts to network latency. A live document scanner at /live lets users hold up paperwork to their phone camera and get structured OCR results in real time through the same LiveKit infrastructure.

The self-training loop runs continuously. When the sentinel vetoes a strategy, the veto becomes a DPO training pair. A Gemini-as-Judge scorer filters pairs before they reach the trainer. QLoRA fine-tuning runs via Unsloth, and the resulting adapter is hot-swapped into the refinery's local model without restarting the engine.

Six capability modules extend what the solver can do: alert orchestration with manifest-driven triggers and debounce, dynamic form generation from a Unified Form Language DSL, location services with geocoding and zone generation, GTFS transit planning, resource directory matching, and a tip line with intake scoring and fusion.

Federation uses Ed25519 cell identities, gossip-based peer discovery, NATS leafnode peering, and Byzantine-robust trimmed mean aggregation for DPO training pairs.

Challenges we ran into

Making Gemini's structured output reliable for legal entity extraction. Early on, Gemini would hallucinate statute numbers or merge two entities into one. Solving this required careful schema design in the manifest — explicit field constraints, enum types for known values, and a refinery engine that cross-validates extracted entities against the knowledge graph and sends them back for re-extraction when they don't pass.

OWL-RL producing invalid triples. The inference engine would occasionally generate triples with Literal subjects (e.g., Literal("fp") owl:sameAs Literal("fp")), which violates RDF semantics. We had to add a post-closure scrub pass to remove these.

Keeping the architecture mission-agnostic. The constant temptation was to add if mission == "aegis" checks when something didn't work for a specific domain. Every time that happened, it meant the manifest schema wasn't expressive enough. Fixing the manifest instead of the code was always harder and always the right call.

NATS header serialization. FastStream's header handling didn't play well with raw NATS publish for traceparent propagation. We had to drop down to raw nc.publish() with manual header injection instead of using the framework's publish abstraction.

The sentinel-refinery dispute loop. The sentinel would veto a strategy, the refinery would revise it, the sentinel would veto the revision, and so on forever. Solving this required a handoff tracker that detects dispute loops (>4 cycles) and escalates to a different resolution path.

Multimodal intake reliability. A phone photo of an eviction notice isn't a PDF. It's a skewed, shadowed, partially occluded image taken under fluorescent lighting at 2 AM. Getting reliable OCR required a multi-stage pipeline: OpenCV document edge detection, perspective dewarping, quality scoring to reject blurry frames, and high-resolution still-frame extraction before sending to Gemini. The streaming video feed processes at ~1 FPS but the actual OCR runs on carefully selected keyframes.

Content authenticity at intake. When the system ingests media from unknown sources, it needs to know what's real. Building twelve parallel content analyzers — audio deepfake detection, SynthID watermark scanning, C2PA provenance verification, perplexity-based AI text detection, structural fingerprinting — and making them fast enough to run on every input without blocking the pipeline was a sustained engineering challenge.

Accomplishments that we're proud of

The core pipeline actually works. Not "works in a demo with hardcoded responses" — Gemini actually extracts entities from a real eviction notice, the knowledge graph actually infers that a statute is violated, Z3 actually validates the constraint, the sentinel actually vetoes unsafe strategies, and the solver generates a defense letter citing the correct law. Every step is traceable through OpenTelemetry spans in Logfire.

The manifest-driven architecture held up. Nine missions across wildly different domains — eviction defense, missing child response, immigration renewal, ICE raid response, cybersecurity zero-day triage — and zero lines of mission-specific code in the engine. The abstraction worked.

The system can overrule itself. The sentinel doesn't just filter outputs — it engages in structured dispute with the refinery, and the handoff tracker detects when they're stuck in a loop. That's not a feature you see in chat wrappers.

The voice agent isn't a chatbot with a microphone. It tracks emotion across utterances, generates natural backchannels, predicts turn-taking boundaries, and injects disfluencies so it doesn't sound like a text-to-speech demo. When the knowledge graph is missing critical fields, the agent asks for them — with confidence-gated confirmation that only asks "did you say case number 24-12345?" when ASR confidence is below 0.85. High-confidence extractions flow through silently.

The live document scanner works. Hold an eviction notice up to your phone camera, and the system detects document edges, dewarps the perspective, runs structured OCR, and returns per-field results with confidence scores — all through the same LiveKit WebRTC infrastructure as the voice agent. No app install, no upload button. Just point and read.

The self-training loop closes the circle. Sentinel vetoes become DPO pairs, filtered by Gemini-as-Judge, fine-tuned via QLoRA, and hot-swapped into the running refinery. The system gets better at exactly the things it got wrong.

What we learned

Formal reasoning changes what LLMs can do. A chatbot with the same Gemini model would summarize the eviction notice and suggest calling a lawyer. The same model, grounded in a knowledge graph with OWL-RL inference, identifies the specific statutory violation and drafts a defense letter. The model didn't get smarter — the infrastructure around it did.

Manifests are a better abstraction than code for domain logic. When the domain expert is a legal aid attorney, not a software engineer, you need a configuration surface they can reason about. JSON schemas with inference rules and safety vetoes turned out to be that surface.

Self-correction is harder than generation. Getting Gemini to produce a good first draft was straightforward. Getting the system to reliably detect when that draft was wrong, veto it, and produce a better one — that took most of the engineering effort.

What's next for Unsprawl

Live data sources. The adapter layer (REST, GraphQL, scraper, NATS stream) is built and the transit module already speaks GTFS, but most missions still run on fixture data for court records and shelter databases. Closing the last mile to real APIs is the primary gap between demo and deployment.

Multilingual pipeline. The voice interface works in English with a Spanish phonetic romanizer for names and addresses. Full multilingual support — where the entire response pipeline works in Spanish, Mandarin, Arabic — is next.

Multi-cell federation in the wild. The federation code is structurally complete but has only run single-cell. Deploying multiple cells across organizations and validating the Byzantine-robust training aggregation with real adversarial conditions is the next major milestone.

Persistent knowledge graph. Currently ephemeral per session. Cross-session entity resolution would enable pattern detection — a landlord filing illegal evictions across dozens of tenants, detected not by one user but by the collective intelligence of the network.

Adversarial safety testing. Every mission has critical safety blocks defined as toxic patterns in manifests — don't call USCIS, don't open the door, don't contact the abuser. These need systematic adversarial testing to verify the sentinel holds under pressure, not just under polite prompts.

Built With

Share this project:

Updates