Inspiration

Machine-learning models increasingly decide who gets a loan, an interview, or a benefit — and they're judged on a single number: accuracy. But a model can be highly accurate and still systematically deny opportunities to women or to a racial group. The harm is invisible in the headline metric, so nobody notices. We wanted to make that invisible harm visible, measurable, and fixable — for anyone, not just ML researchers.

What it does

FairLens audits a real income classifier trained on the UCI Adult / Census Income dataset (48,842 people). It:

  • trains a strong gradient-boosting model (87.6% accuracy, 0.93 ROC-AUC);
  • audits it for bias by sex and race using legally-grounded fairness metrics (the U.S. EEOC four-fifths rule and equalized odds);
  • shows it fails the four-fifths rule badly — a disparate-impact ratio of 0.32 by sex and 0.21 by race (anything under 0.80 is legal "adverse impact");
  • applies a mitigation that closes most of the gap for ~1 accuracy point;
  • wraps it in an interactive dashboard where you slide the decision threshold and toggle the fix to watch fairness change in real time.

Selection rate by sex, before vs after mitigation

How we built it

  • Data cleaning (pandas): normalized three different spellings of "missing," recovered 3,620 hidden missing values, mode-imputed, and dropped a leak column (fnlwgt) and a redundant one (education).
  • Modeling (scikit-learn): HistGradientBoostingClassifier on one-hot-encoded features, 70/30 stratified split.
  • Fairness audit (Fairlearn): per-group selection / true-positive / false-positive rates, disparate-impact ratio, and equalized-odds difference.
  • Mitigation (Fairlearn ThresholdOptimizer): group-aware thresholds for equalized odds — no retraining required.
  • Dashboard (Streamlit + Plotly): live threshold slider, attribute selector, and a before/after comparison that shows the exact accuracy cost of fairness.

A fully-executed Jupyter notebook documents the entire data-science narrative.

Ground-truth income disparity in the data

Challenges we ran into

The sneakiest bug was in cleaning: calling .str.strip() before handling nulls silently turned every NaN into the literal string 'nan', hiding 3,620 missing values and making the data look clean when it wasn't — a good reminder that data-ethics work starts with honest data. Deployment surfaced another: the host provisioned Python 3.14, which had no prebuilt wheels for our pinned libraries, so we loosened them to wheel-available releases.

Accomplishments that we're proud of

Turning an abstract "AI ethics" concern into concrete, defensible numbers tied to actual U.S. anti-discrimination law (the four-fifths rule) — and shipping it as a tool a non-expert can actually operate.

What we learned

  • Accuracy and fairness are independent; you have to measure fairness on purpose.
  • "Fairness through unawareness" (just dropping sex/race) fails, because proxies like relationship and marital-status re-encode the protected attribute.
  • There is no single definition of fairness — the honest move is to show the trade-off, not hide it.

What's next for FairLens

Upload-your-own-CSV auditing, more mitigation methods (reweighing, exponentiated gradient), and an exportable PDF "fairness report" for compliance teams.

Built With

Share this project:

Updates