TerraGuard

Inspiration

When disasters strike, responders drown in unstructured reports: social posts, 911 transcripts, volunteer notes, radio traffic. We wanted a tool that turns that chaos into concise, actionable triage—offline, fast, and trustworthy.
We were inspired by teams in the field who need low-latency, resilient workflows, not a fragile cloud dependency. That led us to a local-first design.

Extracts structured incident reports from free text into a clean JSON format (location_text, time_iso, severity, needs, notes).
Scores risk, provides destination suggestions from known facilities, and computes confidence from an ensemble.
Generates ICS-213 and SITREP summaries, plus EN/ES radio phrasing.
Runs locally against:
- A base model (e.g., gpt-oss:20b via an Ollama-compatible API),
- An ensemble for robustness and confidence,
- Or a fine-tuned LoRA adapter model.
Streams tokens with a JSON parser that tolerates imperfect outputs and recovers valid JSON.

Frontend: Next.js (App Router), TypeScript, Material UI (MUI), Redux Toolkit for state, streaming UI components, and facility-based suggestions.
Backend API routes: /api/extract, /api/extract/stream, /api/extract/ensemble, /api/ics213, /api/radio, /api/sitrep.
Model serving:
- FastAPI server exposing Ollama-compatible /api/generate for base + adapter.
- Works with standard Ollama at http://127.0.0.1:11434 or our Python server on an alternate port via OLLAMA_BASE.
Fine-tuning:
- QLoRA with PEFT on a 4-bit base using TRL SFTTrainer; compact LoRA ranks for single-GPU feasibility.
- Prompts enforce “JSON only” outputs with a schema-like instruction.
- Data pipeline merging synthetic and curated samples.
Robustness features:
- NDJSON streaming and incremental concatenation.
- Heuristics to strip code fences, parse between tags, and recover balanced JSON.
- Ensemble mode to reduce variance and provide confidence.
Evaluation:
- Severity accuracy and needs Jaccard similarity, driven by a simple HTTP eval script.

Enforcing strict JSON outputs; we built multi-step cleanup and balanced-brace recovery.
Running large models locally; quantization and LoRA helped, but CPU inference is tight.
Adapter/base compatibility; PEFT adapters are base-specific and mismatches break loading.
Tooling fragmentation; Ollama prefers GGUF while our FT artifacts are PEFT.
Streaming UX; handling partial tokens while ensuring the final JSON parses.
Data curation; small datasets needed augmentation and clear labeling.

Local-first triage workflow that streams results in real time.
JSON extraction resilient to imperfect generations, yielding valid typed outputs.
Clean UI with one-click ICS-213/SITREP/radio generation.
Ensemble mode that improves stability and provides usable confidence signals.
Lightweight LoRA fine-tune that improves structured extraction quality.

Instruction design matters as much as fine-tuning; concise schema prompts improve JSON validity.
Quantization and LoRA are essential to make large models practical on constrained hardware.
Streaming plus tolerant parsing beats strict decoding for UX, especially with schema recovery.
Simple ensembles deliver valuable confidence estimates for end users.
Simple metrics go far; for set-like fields, Jaccard similarity is intuitive: [ J(A, B) = \frac{|A \cap B|}{|A \cup B|} ]

Merge-and-quantize path: merge LoRA into base and export GGUF to create a native Ollama model/tag.
Constrained decoding: JSON grammars or structured decoders to eliminate post-hoc cleanup.
Active learning loop: capture low-confidence cases for rapid relabeling and continual fine-tuning.
Multilingual expansion and better time normalization; safer geocoding with ambiguity handling.
Packaging for edge devices; containerized GPU images for field deployment.
RAG over facilities/critical infrastructure with freshness guarantees and provenance.
Expanded evaluation across disaster domains and languages.

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.