Whereas traditional programs consist of lines of codes explicitly written by the programmer, machine learning programs are "written" in human unfriendly terms, such as the weights of a neural network. Since people don't fully understand these weights, a programmer can unintentionally introduce bias into their algorithm simply by learning from an unknowingly biased dataset.
If these weights could be reinterpreted in a meaningful way, the programmer could better understand what areas of her model could be improved.
What it does
Fairify is a command line utility and a companion iOS app that automatically de-biases word embeddings.
Word embeddings are one of the most important ideas in natural language processing, and basically amount to vectorized representations of words. Intuitively, we can think of two words as having similar vectors if they share similar semantics (for example, "mother" and "mom".)
Fairify takes in a user's word embedding model and runs it through 200+ analogy completion tasks that test for bias. For example, a user's model might complete the analogy brother : brilliant :: sister : ____ with the word "gorgeous". This might not be a good idea, as it encourages unhealthy stereotypes about intelligence. Fairify penalizes the algorithm by reducing its score.
The biased words (and the analogy that uncovered their bias) is sent to the companion iOS app. From there, the user can simply press a "fix" button, and the utility will de-bias these words in the user's model using the de-biasing technique of Bolukbasi, et. al.
A user might want to not just identify examples of bias in their model, but also understand the societal impact of bias in that word. For this reason, we used Taboola's API to help suggest sources of bias and find articles relevant to the type of bias demonstrated by that analogy. For example, a bias about immigrants might link to a trending article about the border wall.
Once the user quits the command utility, the fixed model is automatically saved to disk.
How we built it & challenges we ran into
There were two main challenges we had to tackle in developing Fairify:
Identifying biased words in the user's model, and fixing them. To actually find the bias, we try analogies with words that typically have a biased connotation. To tell if an analogy is actually biased, we see how the relevant axis differs between examples. For example, when examining gender bias, we examine the gender axis and see how a given word differs on this axis. To correct this example, we essentially neutralize the gender axis for the biased example, therefore removing the gender bias from the words found in the analogy. There was a small learning curve regarding linear algebra when dealing with these 1000+ dimensional vectors.
Giving a visual representation to Fairify's examples of bias. We accomplished this in many ways throughout our UI design. One of the primary ways is by displaying 2D projections of relevant axises on the app. This way, once the user indicates they want to fix an instance of bias, they can visually see the vectors readjusted as Fairify removes the bias. Another way is through our Interactive Model feature. At any point in time, the user can run their own analogy and see what their current model would generate. This proved to be tedious as there were many synchronizing issues that came with this back and forth communication.
What's next for Fairify
Categorizing bias. We would love to be able to give further metrics about the different kinds of bias that appear in the model. This would prove useful in being able to examine the exact problems in your dataset.