Inspiration

Gender disparity in the workforce is the unequal treatment of men and women when it comes to employment opportunities, wages, and overall career progression. Despite significant progress in recent decades, there is still a significant gender gap in many workplaces, with men typically earning more money, receiving more promotions, and occupying more senior positions than women.

It is essential for both employers and employees to be aware of this gender disparity and to take steps to address it. Both employers and employees can play an active role in addressing gender disparity in the workforce by advocating for themselves and others, speaking up against discrimination and harassment, and supporting policies and initiatives that promote gender equality.

Awareness is the first step in taking action. We propose DiversityDetective, a machine learning tool to address difficulties in detecting and handling gender disparity amongst both employees and employers in the workforce. We believe that DiversityDetective can help promote gender equality and create a more inclusive and diverse workplace.

What it does

DiversityDetective is a quantitative tool designed to help employers verify that they are not treating their female and male employees differently. The tool uses a machine learning clustering model to group employees into clusters based on similar merit. By visualizing the mean features for each cluster, as well as the percent gender makeup of the clusters, employers can identify groups of employees that are treated similarly or differently by their employer.

The tool's clustering model allows for a more in-depth analysis of employee information, enabling employers to see beyond the surface level and identify potential instances of bias. For example, the tool may group together a cluster of lower-paid employees who have less experience than other employees, which would not indicate gender bias. However, the tool may also identify a lower-paid cluster with similar experience to other clusters but with a higher percentage of women, indicating potential gender bias.

By identifying these clusters, employers can spot potential unconscious bias and take steps to address it. For instance, they can review their hiring and promotion practices, evaluate their compensation structures, and provide training to their managers on how to avoid unconscious bias.

DiversityDetective also incorporates SHAP feature attribution results to show how important different features are in a model’s predictions. Interestingly, SHAP ranked gender within the top 5 important features, which highlights the significance of gender in the tool's analysis and emphasizes the need for employers to actively monitor and address potential gender bias in their workplace.

How we built it

Our project was built using a range of data science tools, including Numpy, pandas, Sklearn, Scipy, matplotlib, seaborn, and SHAP. The initial stage of our analysis involved preprocessing the data using Pandas, Numpy, and Sklearn to prepare it for both clustering visualizations and model training. Our preprocessing decisions included label encoding and one-hot encoding for categorical data and normalization for continuous data. We used the resulting dataset for K-means clustering and split it further as a preprocessing step for our Random Forest Classifier.

To conduct feature attribution experiments, we used Sklearn's Random Forest classifier and cross-validation module to fit the data. We then used the SHAP module's explainer to generate Feature Attributions based on Shapley values, which highlighted gender as a significant factor in the model's prediction of base income.

For clustering experiments, we used the Sum of Squared Errors (SSE) error metric to generate an Elbow Plot to determine the optimal number of clusters. We ultimately clustered the data into five clusters and generated statistics using Pandas and Numpy. To help users visualize the gender disparity, we created three visualization templates based on our clustering results. These included a PCA Clustering Summary Template, a Linear Regression Visualization Template, and a Pairwise Clustering Visualization Template.

All results and visualizations are included in our Colaboratory notebook.

Challenges we ran into

In our project's initial stage, we set out to gather data on diversity and women's inclusion in the workplace. It was disheartening to discover that there was a dearth of comprehensive data on the subject. Although we found some small datasets containing equality indices, they did not meet our needs. After an extensive search, we came across a dataset on Kaggle by Glassdoor that provided information on the gender pay gap. Despite encountering challenges in the model selection process, we persevered. Our initial attempt at predicting employee performance evaluations using a Random Forest Classifier did not yield significant insights. After a thorough discussion, we decided to shift our focus to classifying groups of employees. In line with our desire to observe trends and gender inequalities among the features, we selected the K-means algorithm.

Accomplishments that we're proud of

We’re proud of all that we were able to extract from the data using KMeans clustering, PCA, Linear Regressions, Random Forest Classifiers, and Shapley Values. We were able to create a fully contained visualization tool for the Gender Pay Gap that will hopefully inspire its users to work to shatter the glass ceiling.

What we learned

We learned a lot about gender-based pay disparities in the workplace from this dataset. Our analysis revealed that despite performing better, women still receive lower pay compared to their male counterparts. We also found that women had higher performance evaluations but lower base pay, while men received higher base pay regardless of age, seniority, job category, or department. We made an interesting observation when examining the relationship between base pay and bonuses: women tend to be paid less than men, but receive more bonuses. Furthermore, when bonuses were plotted over age, it was discovered that young women, specifically those under 35, received more bonuses across the board than any other demographic. We think our findings highlight the need for more equitable compensation practices in the workplace to address the issue of gender-based pay discrimination.

What's next for DiversityDetective

Due to the time constraints of the hackathon, we were unable to fully create all of the capabilities we would have liked to have in DiversityDetective. Notably, we would have liked to integrate everything that we found into a web app so that users can easily access and interact with our visualization templates. We look forward to pursuing this after this hackathon!

Built With

Share this project:

Updates