Inspiration

287,000 mothers die every year from preventable pregnancy complications; 95% in low-income countries. Teenage mothers face 3x higher mortality, yet clinical AI systems routinely misclassify or deprioritize them. We wanted to build something that could actually be deployed in a rural clinic with no specialist access: enter six vitals, get a risk score, know why, and know what to change.

What it does

MaternaAI is a maternal health triage API with a clinical-grade web interface. A health worker enters six patient vitals (age, blood pressure, glucose, temperature, heart rate). The system returns:

  • A 3-class risk classification (Low / Medium / High) with a 90% bootstrap confidence interval
  • A LIME local explanation showing which vitals drove the prediction and by how much
  • A counterfactual: the smallest single vital change that would drop the patient one risk class, with a clinical note on how to achieve it
  • A 3-sentence clinical brief with immediate action
  • A live differential privacy budget tracker showing how much of the epsilon-10.0 budget has been consumed across queries

How we built it

IBM AI Fairness 360 -- three-strategy bias pipeline. We audit teen vs adult mother disparity using five metrics (DI, SPD, EOD, AOD, Theil Index), then apply three independent mitigation strategies: Reweighing (adjusts sample weights at training time), DisparateImpactRemover (repairs the feature space), and CalibratedEqOddsPostprocessing (adjusts output thresholds post-prediction). All three strategies run and all results are surfaced in the UI.

IBM Adversarial Robustness Toolbox -- two attack classes against high-risk patients: Gaussian noise (baseline measurement error) and an iterative black-box attack (targeted vital manipulation to cause a missed referral). We then apply ART's FeatureSqueezing defense -- discretizing each vital into 256 bins -- and report the defended attack success rate alongside the undefended rate.

IBM AIX360 -- LIME tabular explainer gives per-patient local feature attributions. We pair it with a custom counterfactual engine that scans each vital's full clinical range to find the minimum change that shifts the predicted risk class, ranked by smallest relative change.

IBM diffprivlib -- a DP-GaussianNB model (epsilon=1.0) serves as a differentially private high-risk detector. A BudgetAccountant (epsilon=10.0 total) tracks cumulative privacy spend across all API queries at 0.05 epsilon each, displayed live in the interface.

The backend is a Flask REST API. The frontend is a single-page app with no framework -- Inter font, dark theme, tabbed layout for Risk, Fairness, Security, Privacy, and Population views.

Challenges

Getting IBM AIF360's CalibratedEqOddsPostprocessing to accept predictions without raising a feature-array mismatch error required using dataset.copy() and replacing labels/scores in-place rather than constructing a new BinaryLabelDataset. DisparateImpactRemover requires a separate BlackBoxAuditing installation not listed in the main aif360 dependencies. ART's FeatureSqueezing expects scalar clip values but our features have heterogeneous ranges -- we solved this by normalizing to [0,1], squeezing, then denormalizing per-feature. diffprivlib's GaussianNB requires bounds as a tuple of (lower_array, upper_array), not a list of pairs.

What we learned

Integrating IBM's responsible AI stack end-to-end reveals real tensions: fairness mitigation can trade accuracy for equity in ways that aren't obvious until you see all three strategies side by side. Differential privacy budget is a finite resource that forces you to think about query design. Adversarial robustness on tabular clinical data is different from image attacks,py perturbations are clinically bounded, which constrains both the attacker and the defender.

What's next

Integration with DHIS2 and CommCare community health worker apps via the REST API. Expanding to additional sensitive attributes (geography, parity). Training on larger multi-site datasets. Formal clinical validation study.

Built With

  • blackboxauditing
  • datasets
  • flask
  • hugging-face-inference-api
  • ibm-adversarial-robustness-toolbox
  • ibm-ai-fairness-360
  • ibm-aix360
  • ibm-diffprivlib
  • lime
  • numpy
  • pandas
  • python
  • scikit-learn
  • whomaternalhealth
Share this project:

Updates