Machine learning is a powerful tool for automating tasks that are not scalable at the human level. However, when deciding on things that can critically affect people's lives, it is important that our models do not learn biases. Check out this article about Amazon's automated recruiting tool which learned bias against women. However, to completely reject the usefulness of machine learning algorithms to help us automate tasks is extreme. Fairness is becoming one of the most popular research topics in machine learning in recent years, and we decided to apply these recent results to build an automated recruiting tool which enforces fairness.
Suppose we want to learn a machine learning algorithm that automatically determines whether job candidates should advance to the interview stage using factors such as GPA, school, and work experience, and that we have data from which past candidates received interviews. However, what if in the past, women were less likely to receive an interview than men, all other factors being equal, and certain predictors are correlated with the candidate's gender? Despite having biased data, we do not want our machine learning algorithm to learn these biases. This is where the concept of fairness comes in.
Promoting fairness has been studied in other contexts such as predicting which individuals get credit loans, crime recidivism, and healthcare management. Here, we focus on gender diversity in recruiting.
What is fairness?
There are numerous possible metrics for fairness in the machine learning literature. In this setting, we consider fairness to be measured by the average difference in false positive rate and true positive rate (average odds difference) for unprivileged and privileged groups (in this case, women and men, respectively). High values for this metric indicates that the model is statistically more likely to wrongly reject promising candidates from the underprivileged group.
What our app does
jobFAIR is a web application that helps human resources personnel keep track of and visualize job candidate information and provide interview recommendations by training a machine learning algorithm on past interview data. There is a side-by-side comparison between training the model before and after applying a reweighing algorithm as a preprocessing step to enforce fairness.
If the data is unbiased, we would think that the probability of being accepted and the probability of being a woman would be independent (so the product of the two probabilities). By carefully choosing weights for each example, we can de-bias the data without having to change any of the labels. We determine the actual probability of being a woman and being accepted, then set the weight (for the woman + accepted category) as expected/actual probability. In other words, if the actual data has a much smaller probability than expected, examples from this category are given a higher weight (>1). Otherwise, they are given a lower weight. This formula is applied for the other 3 out of 4 combinations of gender x acceptance. Then the reweighed sample is used for training.
How we built it
We trained two classifiers on the same bank of resumes, one with fairness constraints and the other without. We used IBM's AIF360 library to train the fair classifier. Both classifiers use the sklearn Python library for machine learning models. We run a Python Django server on an AWS EC2 instance. The machine learning model is loaded into the server from the filesystem on prediction time, classified, and then the results are sent via a callback to the frontend, which displays the metrics for an unfair and a fair classifier.
Challenges we ran into
Training and choosing models with appropriate fairness constraints. After reading relevant literature and experimenting, we chose the reweighing algorithm (Kamiran and Calders 2012) for fairness, logistic regression for the classifier, and average odds difference for the fairness metric.
Accomplishments that we're proud of
We are proud that we saw tangible differences in the fairness metrics of the unmodified classifier and the fair one, while retaining the same level of prediction accuracy. We also found a specific example of when the unmodified classifier would reject a highly qualified female candidate, whereas the fair classifier accepts her.
What we learned
Machine learning can be made socially aware; applying fairness constraints helps mitigate discrimination and promote diversity in important contexts.
What's next for jobFAIR
Hopefully we can make the machine learning more transparent to those without a technical background, such as showing which features are the most important for prediction. There is also room to incorporate more fairness algorithms and metrics.