ResumeBias

Blind Web App

Disrupt the District 2018

Demo

resumeblind.com

Purpose

To support equality and diversity in the hiring process. White male applicants receive up to 36% more callbacks than their peers based on their names. Blind strips the name off of the resume to reduce racial or gender bias during the first round of hiring.

How

Blind uses natural language processing to identify the name and email on the resume. It then generates an unbiased resume where both pieces of information are blocked out.

Usage

Upload your resume in PDF format on the demo website. Choose how much information you want to remove by clicking the buttons below the submission box. The new PDF will automatically download onto your computer.

Dependencies

Python 3.6+

nltk (Stanford NLP module optional)

numpy

pdfminer.six

PyPDF2

unidecode

Note:

need to run nltk.dowload() and choose book before running entire program

Challenges

Coordinating all of the dependencies among our computers and AWS. Identifying and removing non-standard characters. Finding both a person's first and last name.

Future Improvements

We would have liked to include the improved version of the natural language processing but it was too large for AWS in its current state. Removing other potential biases on resumes such as age or location.

Sources Used:

NLTK

Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O'Reilly Media Inc. (nltk.org)

Email Regex

http://www.regular-expressions.info/email.html

Stanford NLP with NLTK

https://blog.manash.me/configuring-stanford-parser-and-stanford-ner-tagger-with-nltk-in-python-on-windows-f685483c374a

NFL Players Dataset

https://raw.githubusercontent.com/theliamcrawford/6-Degrees-of-NFL-Players/master/names.txt

Share this project:
×

Updates