Project Story: Bias Bounty Competition - Loan Approval Bias Detection

About the Project

Inspiration

The Bias Bounty competition inspired us to tackle the critical challenge of bias in automated loan approval systems, a domain where unfair decisions can exacerbate systemic inequalities. Motivated by real-world implications of biased AI in financial services, we aimed to build a model that balances predictive accuracy with fairness across sensitive attributes like Gender, Race, and Zip_Code_Group. Two reference notebooks provided key insights: a polynomial regression notebook introduced bias-variance trade-off analysis, guiding our model evaluation, while an XGBoost notebook with K-fold cross-validation inspired robust preprocessing and training strategies. Our goal was to create an ethical, interpretable, and production-ready pipeline that addresses disparities (e.g., Gender DPD: 0.4167, Non-binary recall: 0.3125) while achieving competitive performance.

What We Learned

Developing this pipeline deepened our understanding of fairness in AI, particularly through the use of fairlearn for Demographic Parity Difference (DPD) and Equalized Odds Difference (EOD) metrics. We learned to implement advanced bias mitigation techniques like ExponentiatedGradient with DemographicParity constraints, improving fairness for minority groups. The polynomial regression notebook taught us to quantify bias and variance, revealing underfitting in Logistic Regression (accuracy: 0.6284). The XGBoost notebook guided our preprocessing (e.g., Box-Cox transformation) and cross-validation, enhancing robustness. We also gained expertise in handling imbalanced data with SMOTE and creating interpretable visualizations (e.g., SHAP, fairness plots) for stakeholder communication.

How We Built the Project

The project was built as a modular Python pipeline (loan_model.py) with distinct components:

  • DataPreprocessor: Engineered features (e.g., Income_to_Loan_Ratio), applied Box-Cox transformation for skewed features (inspired by the XGBoost notebook), and used OneHotEncoder for categorical variables.
  • ModelTrainer: Trained Logistic Regression and XGBoost with 5-fold cross-validation and hyperparameter tuning via GridSearchCV, incorporating ExponentiatedGradient for bias mitigation.
  • Fairness Auditing: Used fairlearn to compute DPD (e.g., Gender: 0.4167) and EOD, filtering small groups (<10 samples) for reliability.
  • Visualizations: Generated SHAP plots, bias-variance plots (inspired by the polynomial regression notebook), approval rate bar plots, and Gender-Race heatmaps using matplotlib and seaborn.
  • Production Features: Added logging, timing (from the XGBoost notebook), and error handling for NaN values and unknown categories. The pipeline outputs a submission file (submission_5fold_xgb_*.csv), visualizations in charts/, and an AI Risk Report (ai_risk_report.md). The report was converted to .docx and .tex formats for accessibility.

Challenges Faced

  • Bias Mitigation: High Gender DPD (0.4167) and low Non-binary recall (0.3125) required careful tuning of ExponentiatedGradient, with DemographicParity constraints only partially addressing disparities.
  • Data Imbalance: Small sample sizes for Non-binary and Native American groups led to unstable metrics, mitigated by SMOTE but needing further oversampling.
  • Error Handling: An AssertionError in fairlearn (data loaded only once) was resolved by creating a new ExponentiatedGradient instance for final retraining. Pandas dtype warnings were fixed by setting dtype=object for sensitive features.
  • Performance: Modest accuracies (0.6284, 0.6200) indicated underfitting, addressed partially through hyperparameter tuning but requiring future ensemble methods.
  • Visualization: Balancing technical detail with stakeholder clarity in plots (e.g., SHAP, fairness metrics) was challenging but achieved through iterative design.

Built With

  • Languages: Python 3.8+
  • Frameworks/Libraries:
    • pandas, numpy: Data manipulation
    • scikit-learn: Logistic Regression, OneHotEncoder, StandardScaler, GridSearchCV, KFold
    • xgboost: XGBoost model
    • fairlearn: Fairness metrics (DPD, EOD) and ExponentiatedGradient
    • imbalanced-learn: SMOTE for oversampling
    • shap: Feature importance visualization
    • matplotlib, seaborn: Plotting (approval rates, fairness metrics, bias-variance)
    • scipy: Box-Cox transformation
    • joblib: Model serialization
  • Platforms: Local Python environment (e.g., Jupyter, VSCode)

Brief Description of the Model

The model is a machine learning pipeline for predicting loan approvals while mitigating biases in loan_access_dataset.csv. It uses Logistic Regression and XGBoost with 5-fold cross-validation, achieving accuracies of 0.6284 and 0.6200, respectively. Fairness is ensured through ExponentiatedGradient with DemographicParity constraints, addressing high Gender DPD (0.4167) and low Non-binary recall (0.3125). Visualizations (SHAP, fairness plots, bias-variance) and a comprehensive AI Risk Report enhance interpretability and stakeholder communication.

Built With

  • fairlearn
  • imbalanced-learn
  • python
  • shap
  • xgboost
Share this project:

Updates