🌍 CrisisLens — Hackathon Submission Write-Up
Inspiration Every year, the UN publishes data showing billions of dollars flowing into humanitarian aid — yet children still die of preventable causes in crises the world has simply forgotten about. We kept asking: why does Yemen get 10x the media coverage of South Sudan? Why does Chad receive a fraction of the pooled fund support that comparably severe crises receive? The answer isn't malice — it's the absence of a systematic, data-driven way to detect funding mismatches before the window to act closes. UN coordinators are overwhelmed, donors follow headlines, and the most overlooked crises stay overlooked. We built CrisisLens because we believe that if you can see the gap clearly, you can close it.
What It Does CrisisLens is a humanitarian funding intelligence platform that answers three questions the UN needs answered every day:
- Which crises are most overlooked? We calculate an Overlook Score for every active humanitarian crisis — combining people in need, OCHA severity classification, and funding coverage rate into a single 0–100 ranking. Score of 100 means the world has looked away.
- Where are pooled funds failing to fill the gap? We analyze CBPF (Country-Based Pooled Fund) allocations against funding gaps and surface mismatches — crises with catastrophic need that pooled funds have barely touched.
- Which humanitarian projects have suspicious efficiency ratios? Using Spark ML, we flag projects whose beneficiary-to-budget ratio is a statistical outlier vs. comparable projects in the same sector. High outliers are candidates for scale-up. Low outliers need investigation. The output is an interactive map where every crisis is a clickable circle — color-coded by Overlook Score — plus a full Databricks SQL Dashboard with rankings, regional breakdowns, and project benchmarks.
How We Built It We used the full Databricks stack on real UN OCHA data: Data: Ingested from HUMDATA (UN's open data exchange) — Humanitarian Needs Overview, Humanitarian Response Plans, OCHA Financial Tracking Service, and CBPF Pooled Fund data. 25 active crises, 160+ humanitarian projects, real 2023 funding figures. Architecture — Databricks Medallion:
🥉 Bronze: Raw data landed as Delta Lake tables with full versioning and time travel 🥈 Silver: Cleaned and enriched with 11 derived metrics per crisis — Overlook Score, Z-scores, CBPF gap coverage, estimated unfunded population 🥇 Gold: Business-ready rankings, regional summaries, CBPF mismatch analysis, executive summaries
ML Pipeline (Spark MLlib):
StringIndexer → VectorAssembler → StandardScaler → KMeans(k=6) Evaluated with Silhouette Score Produces 6 benchmark groups; for any project, finds the 5 most comparable projects by cosine distance
Visualization:
Folium interactive map with clickable crisis popups showing full funding cards 4 Matplotlib charts: bubble chart, funding gap waterfall, CBPF mismatch scatter, outlier distributions 8-panel Databricks SQL Dashboard for ongoing operational use
Infrastructure: Databricks Workflows DAG connecting all 6 notebooks end-to-end. GitHub Repo integration. CI/CD via GitHub Actions.
Challenges We Ran Into Live data access. HUMDATA's API occasionally times out or returns misformatted CSVs — especially for the HRP project-level data which spans dozens of country-specific files. We built a robust fetcher with fallback to embedded real-world data so the pipeline never breaks. Comparing apples to apples. You can't compare a food distribution project to a shelter construction project — their cost structures are completely different. Getting the cluster-level Z-score normalization right, so we were only flagging genuine outliers within the same sector, took several iterations. The CBPF mismatch problem is harder than it looks. A country receiving "a lot" of CBPF support might still be severely misallocated if its funding gap is enormous. We had to define the Mismatch Score carefully — it's not about absolute CBPF allocation, it's about gap coverage rate weighted by overlook severity. Making it actionable, not just academic. Early versions produced a ranked list and stopped there. We pushed to add the "Key Insight" auto-generated sentence per crisis, the recommended additional CBPF figure, and the benchmark comparison engine — because data without a clear "so what" doesn't move the needle.
Accomplishments That We're Proud Of
Built a complete end-to-end pipeline — from raw OCHA API to interactive map — in under an hour using Databricks The Overlook Score algorithm is genuinely novel: weighting severity quadratically and normalizing across all active crises produces a ranking that matches humanitarian professionals' intuition but is fully reproducible and auditable Every line of code is commented — this isn't a black box. A UN data team could pick this up, understand it, and extend it without us The interactive map looks like something a real UN agency would ship — dark theme, clickable popups with funding progress bars, color scale legend, top-5 crisis labels The benchmarking engine is directly useful: paste any HRP project code, get 5 comparable projects and a performance assessment in seconds Production-quality architecture: Delta Lake versioning means every run is auditable, you can time-travel to any previous data state, and the pipeline is fully idempotent (safe to re-run anytime)
What We Learned
UN humanitarian data is richer than most people realize — HUMDATA has project-level detail, cluster breakdowns, and pooled fund data that almost nobody outside the sector knows exists. The bottleneck isn't data availability, it's tooling to make sense of it at scale. Databricks Medallion Architecture isn't just for enterprise — even for a 25-country dataset, the discipline of Bronze/Silver/Gold made our analysis dramatically more reliable and debuggable than a flat notebook would have been. The hardest part of humanitarian analytics is defining "overlooked" — there's no neutral answer. Our Overlook Score embeds choices (severity², 40% threshold, min-max normalization) that should be debated and refined with UN domain experts. The tool is a starting point, not the final word. Spark ML is powerful but opaque — KMeans gives you clusters, but labeling them meaningfully requires human judgment. We learned to always pair ML outputs with interpretable summaries.
What's Next for CrisisLens Short term (next 30 days):
Ingest the full live HUMDATA dataset automatically on a daily schedule via Databricks Workflows Add time-series tracking — watch a crisis's Overlook Score deteriorate over months as donor fatigue sets in Build a donor-facing view: "Your $10M has highest marginal impact in these 3 crises"
Medium term:
Integrate media coverage data (GDELT) to quantify the gap between crisis severity and public attention — the true "overlooked" signal Add natural language alerts: "Sudan's Overlook Score increased 15 points this month — automated brief generated" API endpoint so UN OCHA systems can query CrisisLens scores directly
Long term:
Partner with UN OCHA to run CrisisLens on their internal data (including non-public HRP project data) for higher fidelity Expand to sub-national level: which regions within a crisis country are most overlooked? Open-source the Overlook Score methodology for peer review by the humanitarian research community
CrisisLens is a starting point. The goal is a world where no crisis stays invisible simply because the data wasn't organized well enough to see it.
Built With
- delta
Log in or sign up for Devpost to join the conversation.