-
-
Landing/Home Screen
-
General Location Safety Analysis Screen
-
Route Safety Analysis Feature + Realtime Route Tracking & Segmentation
-
Emergency Agentic Voice AI
-
Lumos Bot Assistance
-
Realtime Live Incident Citizen API Showcase
-
Nearest Safety Centers Aggregation
-
VULTR Cloud Compute Backend Deployment
-
Firebase Functions + Webhooks
-
Google Cloud Realtime Database | Storing 911 Call Data every 30s
🔦 Lumos | Safety Intelligence and Predictive Analysis Platform
💡 Inspiration
The idea for Lumos came from a simple fear we've all felt: walking at night in an unfamiliar place, wondering is this street safe?
Existing safety tools give you a vague "high crime area" label with no context — no breakdown of what crimes, when they happen, or how the risk shifts hour by hour. We wanted to build something that actually answers the question a traveler, a group of college students walking home, or a solo jogger is really asking:
Right here, right now — how safe am I, and what specifically should I watch out for?
We also considered emergencies: if you're in danger and can barely speak, or you're in a foreign country and don't know the local emergency number, or you're too panicked to explain your medical conditions to a dispatcher — what then? We wanted Lumos to not just tell you about risk, but to actively protect you when things go wrong.
⚙️ What It Does
Lumos is a real-time, location-aware safety intelligence platform that fuses 48 gigabytes of raw federal crime data and over a dozen live APIs into a single, actionable safety score for any location on Earth.
🗄️ Massive Data Foundation
Before a single line of application code was written, we built a data pipeline that downloaded and processed 771 state-year NIBRS data packages from the FBI, spanning all 51 U.S. jurisdictions (50 states + D.C.) from 1991 to 2024. That's 32,553 individual CSV files across 37,610 total files, containing roughly 59 million crime incident records covering a population of over 235 million people.
Each state-year package contains 33 relational tables — incidents, offenders, victims, arrestees, weapons, injuries, property descriptions, bias motivations, drug types, and more. This 48 GB of raw data was distilled through our precompute_nibrs.py pipeline into curated agency profiles for 7,717 law enforcement agencies — and that's just the offline data, before any of our 11+ live API integrations even fire.
🔐 Core Safety Analysis
Enter any address, landmark, or coordinate and Lumos returns a 0–100 safety score computed by a 25-feature XGBoost machine learning model trained on real FBI NIBRS profiles. The score comes with a breakdown of predicted crime types ranked by probability (e.g., "Theft from Vehicle = 24%, Aggravated Assault = 18%"), each adjusted for the current time of day using Bureau of Justice Statistics temporal crime curves across 40+ NIBRS offense codes.
📈 24-Hour Risk Timeline
A continuous hourly risk chart shows how safety fluctuates over the next 24 hours, with peak danger windows and safest travel times extracted automatically. The temporal model uses BJS-derived multipliers that know shoplifting peaks during business hours while aggravated assault peaks after 10 PM.
🗺️ Interactive Crime Heatmap
A Mapbox GL globe renders a crime density heatmap seeded from real incident data — first from 30+ Socrata open data portal integrations (each with a hand-built API adapter for that city's unique schema), then augmented with NIBRS agency offense distributions, and enriched by Gemini AI with contextual neighborhood descriptions.
📡 Live Incident Feed
Lumos pulls real-time incidents from the Citizen app API, computing a Citizen Incident Adjustment (CIA) penalty using Haversine distance decay, recency decay (incidents lose weight over 6 hours), severity weighting (shootings penalized 8× more than noise complaints), and source credibility scoring — all capped at 25 points to prevent score collapse.
🛣️ Route Safety Analysis
Compare up to three route alternatives between any origin and destination — Lumos evaluates each via the Google Routes API, scoring safety along the path and recommending the safest option, not just the fastest.
🤖 AI Safety Tips & Chat
A context-aware Gemini-powered assistant generates tips specific to your exact location, time, weather, and crime profile — with a persistent chat widget for follow-up questions.
🚨 Emergency Voice AI System
The most ambitious feature: Lumos can call 911 on your behalf using an AI voice agent. Built on VAPI's telephony infrastructure with a custom Gemini 2.5 Flash LLM endpoint and ElevenLabs text-to-speech:
- Pre-fills your emergency profile (name, medical conditions, allergies, emergency contacts, medications)
- Streams your live GPS coordinates to the AI agent in real-time via Firebase Realtime Database
- Lets you type messages during the call that are injected into the conversation
- Works internationally with emergency numbers for 50+ countries
- Features an active call bar overlay with duration, live transcript, and one-tap hang-up
🏗️ How We Built It
📦 Data Collection & Processing Pipeline
The foundation of Lumos is a lot of data. We downloaded 48 GB of raw FBI NIBRS crime data: 771 state-year packages, each containing up to 33 relational CSV tables. Our precompute_nibrs.py pipeline streams these 32,553 CSV files across all 51 jurisdictions, deduplicates agencies, normalizes inconsistent offense codes, and computes per-agency statistical profiles: offense mix distributions, crime rates per 100K, violent/property crime ratios, weapon usage rates, hourly and day-of-week distributions, victim demographics, and officer density.
The result: 7,717 curated agency profiles and 51 state temporal profiles — 48 GB compressed down to 43 MB of production-ready intelligence. And that's just the offline dataset, not including the live data from our 11+ API integrations.
On top of the NIBRS pipeline, we built a separate FBI UCR data layer covering 8,986 cities (Table 8), 676 universities (Table 9), and 2,364 counties (Table 10), plus a 136,626-city municipal vectors dataset (28 MB) for location feature enrichment. We also hand-integrated 30 individual Socrata/open data endpoints — each city with its own API schema, field names, date formats, and coordinate encodings — plus a dynamic discovery mechanism that scrapes the Open Data Network to find datasets for cities we haven't manually configured.
🧠 Machine Learning Model
A 25-feature XGBoost regression model trained on the NIBRS agency profiles. Ground-truth safety labels use a sigmoid curve calibrated to real-world crime rates (800/100K → 0.93 safety, 2400 → 0.64, 4003 → 0.42), with contextual adjustments for weapon prevalence, stranger-crime ratio, group size, gender, weather, and officer density.
Hyperparameters: max_depth=8, learning rate 0.05, 500 boosting rounds with early stopping at 20. At inference time, the XGBoost prediction is blended 60/40 with a formula-based fallback for graceful degradation when features are sparse.
☁️ Backend & Cloud Infrastructure
Our backend is powered by Python 3.11 with FastAPI, deployed on Vultr Cloud Compute — giving us a fast, reliable, globally-distributed infrastructure to handle real-time safety scoring at scale.
We chose Vultr's Cloud Compute instances to host our backend API, taking advantage of low-latency SSD-backed virtual machines to ensure rapid response times even under load. Vultr's straightforward deployment model let us spin up and configure our production environment quickly during the hackathon, with flexible scaling options for the intensive parallel processing our safety pipeline demands.
The FastAPI backend is fully async, using httpx and asyncio.gather() to fire 11 parallel external API calls per request. It is rate-limited at 30 req/min/IP with ML LRU caching (10,000 entries). A Gemini AI refinement layer post-processes raw ML scores as a sanity check.
Backend Stack Summary:
- 🖥️ Hosting: Vultr Cloud Compute (SSD-backed virtual machines)
- 🐍 Runtime: Python 3.11 + FastAPI (fully async)
- ⚡ Concurrency:
httpx+asyncio.gather()— 11 parallel API calls per request - 🔒 Rate Limiting: 30 req/min/IP
- 💾 Caching: ML LRU cache with 10,000 entries
- 🤖 AI Layer: Gemini refinement for score validation
🖥️ Frontend
React 18 with TypeScript, Vite, Tailwind CSS, and shadcn/ui. Mapbox GL JS powers the interactive 3D globe with custom heatmap layers, route polylines, and POI markers. Framer Motion handles animations. PWA-ready with a service worker.
🆘 Emergency System
Firebase Cloud Functions (v2) bridge the frontend to VAPI's telephony API. A custom LLM endpoint intercepts VAPI's conversation flow and injects Gemini 2.5 Flash responses enriched with real-time Firebase RTDB context (GPS updates, user messages). ElevenLabs provides TTS.
🧩 Challenges We Ran Into
Taming 48 GB of federal crime data. Processing NIBRS data for 51 jurisdictions spanning 33 years — each with different CSV schemas, encoding issues, missing fields, and inconsistent offense codes — was a massive data engineering challenge. Some states have data going back to 1991; others only started reporting in 2021. We built a resilient streaming pipeline that handles partial data, deduplicates agencies, and normalizes everything into a unified profile format.
Calibrating the safety score. Getting a safety score to "feel right" was harder than training the model. A purely statistical score didn't match human intuition: a neighborhood with high petty theft but zero violent crime shouldn't score the same as one with frequent assaults. We spent significant effort tuning the sigmoid curve parameters, contextual adjustment weights, and the 60/40 blend ratio.
Socrata endpoint heterogeneity. Each of the 30+ city data portals uses different field names, date formats, coordinate encodings, and query syntaxes. Philadelphia uses Carto SQL, Boston uses CKAN, DC uses ArcGIS REST, and Dallas embeds coordinates in a nested geocoded_column JSON object. We built a universal adapter layer that normalizes all of these.
Real-time emergency call orchestration. Making an AI call 911 required solving VAPI's strict latency windows, Firebase RTDB propagation timing, mid-call GPS injection, and clean dual-teardown (VAPI control URL + REST API fallback).
Temporal crime modeling. Raw crime data doesn't tell you when crimes happen. We manually derived temporal multipliers for 40+ NIBRS offense codes from BJS Criminal Victimization supplementary tables, then built an hourly risk curve with 2.5× temporal amplification and sliding 4-hour window smoothing.
🏆 Accomplishments We're Proud Of
Processing 48 GB of federal crime data into a working ML pipeline in a hackathon. We didn't use a toy dataset or synthetic data — we downloaded, parsed, and processed nearly 59 million real crime incidents from the FBI's NIBRS system across 33 years and every U.S. jurisdiction. The pipeline produces 7,717 agency profiles with offense distributions, temporal patterns, and demographic breakdowns, all feeding a production XGBoost model.
136,626 cities covered. Between our NIBRS agency profiles, FBI UCR lookups (8,986 cities, 676 universities, 2,364 counties), municipal vectors dataset, and 30+ live Socrata integrations, Lumos has data for virtually every populated location in the United States — and the 4-tier fallback chain ensures meaningful predictions even for the smallest towns.
An AI that can call 911 for you. A working emergency voice AI system with live GPS streaming, medical profile relay, real-time text injection, and international emergency number support.
Sub-second safety scoring from 11+ parallel data sources. Despite the breadth of data fusion (FBI, NIBRS, Socrata, Census, NWS, Google, Ticketmaster, Citizen, Astronomy, OpenWeatherMap), the async architecture deployed on Vultr Cloud Compute returns fully enriched safety responses in under 2 seconds.
📚 What We Learned
We learned that crime data is messy at scale. Building a model that works requires as much data engineering and domain knowledge (BJS victimization surveys, NIBRS coding standards, UCR reporting hierarchies) as it does ML expertise.
We also gained deep appreciation for the open data ecosystem's complexity — 30+ city portals, each a snowflake. The hardest part of data engineering is often not the algorithms, it's the plumbing.
And finally, we're all happy to have learned to integrate new technologies into our workflows: from Vultr's cloud infrastructure for scalable backend hosting, to VAPI's agentic phone call APIs — expanding what we thought was possible to build in a hackathon.
🚀 What's Next for Lumos
| Roadmap Item | Description |
|---|---|
| 🌍 Expanded international coverage | Crime data from UK Police API, Eurostat, and other international sources |
| 📊 Predictive temporal modeling | Time-series models on historical NIBRS trends to predict crime rate trajectories |
| 👥 Community safety network | Real-time "Walk With Me" live location sharing, crowd-sourced safety ratings, peer alert propagation |
| 📱 On-device ML inference | Porting XGBoost to ONNX/WebAssembly for offline safety scoring |
| 🗣️ Smarter dispatcher conversations | Training the model with 911 dispatch-specific data and NLP for more accurate, reliable emergency services |
📊 Data At A Glance
| Metric | Value |
|---|---|
| Raw NIBRS data | 48 GB |
| State-year packages | 771 (51 jurisdictions × up to 33 years, 1991–2024) |
| CSV files | 32,553 |
| Total files | 37,610 |
| Tables per package | 33 relational CSVs |
| Total crime incidents | ~59 million |
| Law enforcement agencies profiled | 7,717 |
| Cities covered | 136,626 |
| Live API integrations | 11+ |
| Compressed production dataset | 43 MB |
Built With
- citizen
- cloud-functions
- elevenlabs
- fastapi
- fbi-crime-data-explorer-api
- fbi-nibrs
- firebase-(auth
- firestore
- framer-motion
- geocoding)
- google-gemini-2.5-flash
- google-maps-platform-(places
- mapbox-gl
- national-weather-service-api
- openweathermap
- python
- react
- realtime-database)
- routes
- scraping
- shadcn/ui
- socrata-open-data
- tailwind-css
- ticketmaster-api
- twilio
- typescript
- us-census-bureau-api
- vapi
- vite
- vultr
- webhook
- xgboost





Log in or sign up for Devpost to join the conversation.