Inspiration

I grew up in Gopalganj, Bangladesh — the epicenter of what the WHO calls "the largest mass poisoning in history." Over 140 million people across Bangladesh drink arsenic-contaminated groundwater from tube wells. The poison is invisible, tasteless, and odorless — families drink it for decades without knowing. By the time skin lesions and cancers appear, the damage is done.

The brutal truth: 240 million people worldwide are exposed to arsenic above WHO limits. Every child born into these communities inherits the exposure. This isn't a one-time disaster — it's a continuous, intergenerational crisis.

The question that haunted me: the data exists. Geochemical surveys are collected constantly — by mining companies looking for minerals, by environmental agencies tracking pollution. Why isn't that same data being screened for the human cost?

The same geological formations that leach arsenic into drinking water also host copper, lithium, and gold. One dataset. Two missions. So I built GeoIQ: one engine that maps both — and never confuses them.

What It Does

GeoIQ is a C++17 geostatistical engine that ingests multi-element geochemical data and produces three architecturally-separated intelligence tiers:

Tier 1 — Public Safety (Community Health)

A health worker opens an offline HTML dashboard. They see arsenic, lead, mercury, and fluoride hazard alerts anchored to real GPS coordinates. Every alert references WHO/EPA thresholds and explains exactly which rule triggered it. They export a field report and dispatch a testing team. No Z-scores. No mineral targets. No raw data confusing the message.

Tier 2 — Advanced Research (Geologists & Scientists)

The same geography, but with full geochemical evidence: Z-scores, Cheng singularity indices, multi-element anomaly classification across 10 mineral targets (gold, copper, lithium, rare earths, zinc-lead, and more). Every number carries its formula derivation — 42,711 evidence entries, every one traceable.

Tier 3 — Full Audit (Scientists & Regulators)

Everything visible — geological targets, health assessments, and mine cross-references showing, for example, that a copper anomaly sits 0.33 km from an active copper mine. A WHERE-style query box lets auditors filter the dataset in real time.

The key innovation: Tier 1 physically contains zero geological data in its payload. Not CSS hiding — architectural separation at the JSON level. A public health officer cannot accidentally (or deliberately) access mineral exploration intelligence. That data simply isn't in their file.

How I Built It

Backend Engine (GeoIQ.exe) — ~5,000 lines of pure C++17

Every algorithm is implemented from scratch — no ML frameworks, no cloud APIs, no black boxes.

KD-Tree Spatial Indexing — A memory-stable 2D binary tree resolving neighborhood queries in $O(\log N)$ time across 14,571 sample points. Grid search would freeze; the KD-Tree completes in milliseconds.

Inverse Distance Weighting (IDW) — Interpolates virtual grid compositions:

$$\hat{Z}(x) = \frac{\sum_{i=1}^{k} w_i Z(x_i)}{\sum_{i=1}^{k} w_i}, \quad w_i = d(x, x_i)^{-p}$$

Simple averaging smooths over contamination hotspots. IDW preserves them — critical for detecting the spikes that actually poison people.

Cheng (2007) Power-Law Local Singularity — Isolates true geochemical anomalies from natural background noise using scale-invariant regression:

$$\log(C(r)) = (d - \alpha)\log(r) + c$$

Where $\alpha < 2.0$ signals anomalous enrichment (a potential ore body or contaminant source) and $\alpha > 2.0$ indicates background.

8-Sector Compass Dispersion Analysis — Scans neighboring samples in 8 compass directions, detects spatial gradients, and vectors back toward probable geological sources using exponential decay: $C(d) = C_0 e^{-\lambda d}$.

Multi-Criteria Rule Engine — Expert geochemical logic classifying targets. Each score shows its full derivation:

$$\text{score} = \text{mean}([0.55, 0.30, 0.70]) \times \text{depth_mod}(1.4) \times \text{confidence}(1.00) = 0.723$$

Nothing hidden. Nothing unexplained.

Frontend Generator (HTMLWriter.exe) — Standalone C++17

Pairs geology and health predictions within a 5 km Haversine radius, serializes data with tier-aware filtering (Tier 1 strips all mineral targets at the data level), and generates three standalone offline Leaflet.js dashboards.

Why C++17 and static compilation?

Because the people who need this tool don't have fiber internet or a Python environment. A single .exe that runs on any Windows machine with no DLLs, no dependencies, no installation — that's what makes it usable in a rural Bangladeshi health outpost or a field exploration camp.

Responsible AI Guardrails

This is the part I'm most proud of.

  1. No false complacency. The string "SAFE" is forbidden across the entire codebase and database. Baseline readings are labeled "LOW RISK — Background level. Continue routine monitoring." A tool that says "you're safe" can stop people from testing. GeoIQ never does.

  2. Architectural information isolation. The same copper anomaly that excites a geologist could trigger dangerous illegal mining if published to a rural community. So mineral data is architecturally withheld from the public tier — not hidden with CSS, stripped from the JSON payload.

  3. Clinical handoff, never diagnosis. GeoIQ is a screening tool, not an oracle. Water warnings advise laboratory testing. Health warnings advise consulting a professional. The AI surfaces; the human decides.

Challenges

  • The data gap. Public geochemical surveys rarely include the full element panel needed for mineral targeting. The 14,571-sample USGS dataset only had As, Fe, Mn — enough to prove health screening at scale, but not multi-target mineral work. A separate 60-sample dataset with full coverage proved the mineral engine. Same pipeline, two datasets, two missions.

  • Enforcing tier separation honestly. Hiding data with CSS isn't security — anyone can open dev tools. The real challenge was stripping geological intelligence at the serialization level, so Tier 1's file physically contains zero mineral targets. The separation is architectural, not cosmetic.

  • Honest explainability without clutter. Every Z-score carries its formula string. But a health officer shouldn't need to know what a Z-score is. Tier 1 strips the math and shows plain risk levels with WHO context. Transparency for the researcher, simplicity for the community.

What I Learned

Dual-use AI is hard. The same reading that looks "low arsenic, probably fine" could cause dangerous complacency. The same copper anomaly that helps an exploration team could trigger illegal mining if leaked. Good architecture doesn't just flag risks in the data — it restricts who sees what.

C++17 static compilation is unforgiving but worth it. A single binary that runs anywhere, no dependencies, no internet — that's what makes a tool actually usable in the places that need it most.

The "rumor" isn't misinformation. It's the absence of data. Families assume water is safe because it looks clear. The fix isn't fact-checking — it's making the invisible visible.

What's Next

  • Integrate Bangladesh-specific datasets (DPHE/BAMWSP arsenic surveys) to map my own community
  • Add fluoride mapping for East African Rift Valley communities (different poison, same engine)
  • Deploy Tier 1 dashboards to community health workers via offline SD cards
  • Publish the Cheng singularity methodology as a peer-reviewed geostatistics paper

Built With

Share this project:

Updates