Counterfactuals

Counterfactuals for Machine Learning Models

A counterfactual explanation describes a situation in the form: “If X had not occurred, Y would not have occurred”. For example: “If I hadn’t taken a sip of this hot coffee, I wouldn’t have burned my tongue”. Event Y is that I burned my tongue; Cause X for this event is that I had a hot coffee.

In interpretable machine learning, counterfactual explanations can be used to explain predictions of individual instances. The “event” is the predicted outcome of an instance, the “causes” are the particular feature values of this instance that were input to the model and “caused” a certain prediction.

Example

ML powers quite a lot of automated decision making software these day. Repeat offender analysis, credit score, grants, etc. Peter applies for a loan and gets rejected by the (machine learning powered) banking software. He wonders why his application was rejected and how he might improve his chances to get a loan. The question of “why” can be formulated as a counterfactual: What is the smallest change to the features (income, number of credit cards, age, …) that would change the prediction from rejected to approved? One possible answer could be: If Peter would earn 10,000 Euro more per year, he would get the loan. Or if Peter had fewer credit cards and hadn’t defaulted on a loan 5 years ago, he would get the loan. Peter will never know the reasons for the rejection, as the bank has no interest in transparency, but that’s another story.

What I built

A script that will take in a particular instance, a machine learning model, and provide an example closest to the given instance for which the prediction of the model changes. The notebook contains one example using a very popular dataset called the iris dataset.

Built With

jupyter-notebook
python
scikit-learn

Updates

Avi Jain started this project — Dec 01, 2018 09:01 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.