Tech Sector Employment Diversity in Silicon Valley
AIM: To understand workplace diversity in the technology sector through the lens of an observational study of employment statistics centered around the Silicon Valley tech companies.
OBJECTIVES:
- Observe workplace diversity in terms of:
- Gender: Male vs. Female (sex ratio)
- Race: White/Caucasian vs. Non-White (non-white ratio)
- Job Category: White Collar Jobs vs. Blue Collar Jobs (blue-collar ratio)
- Draw bar graphs/pie charts of the factors:
- Company Diversity
- Racial Diversity
- Gender Diversity
- Job Role Diversity
- Deduce summary statistics on the figures of the aforementioned criteria:
- Mean
- Median
- Standard Deviation
- Quantile Ranges
- Draw histogram and plots of the factors’ spread:
- Sex Ratio
- Non-White Ratio
- Blue-Collar Ratio
- Generate and plot the model(s) to prove linear independence of the diversity factors.
THEORY:
- Sex Ratio: In this context, the sex ratio is calculated as the ratio of number of female employees to the total number of employees. This can also be used to calculate the inverted sex ratio (the ratio of number of male employees to the total number of employees).
- Non-White Ratio: In this context, the non-white ratio is calculated as the ratio of number of non-white employees (American-Indian/Alaskan Native, Asian, Black/African- American, Hispanic/Latino, Native Hawaiian/Pacific Islander, Multiracial) to the total number of employees. This can also be used to calculate the white ratio (the ratio of number of white employees to the total number of employees).
Blue-Collar Ratio: In this context, the blue-collar ratio is calculated as the ratio of number of blue-collar employees (Craft Workers, Laborers/Helpers, Operatives, Service Workers, Technicians) to the total number of employees. This can also be used to calculate the white-collar ratio (the ratio of number of white-collar employees to the total number of employees)
While it’s quite intuitive to imagine that the above 3 factors must be directly proportional or linearly related, this project aims to show that they are linearly independent of each other, especially relevant with the data from Silicon Valley.
DATASET METADATA:
Source: https://github.com/cirlabs/Silicon-Valley-Diversity-Data/blob/master/Reveal_EEO1_for_2016.csv (Data is available under the Open Database License)
Credits: "Reveal from The Center for Investigative Reporting." https://www.revealnews.org/svdiversity
Cleaned Working Data:
| company | race | gender | job_category | count | |
|---|---|---|---|---|---|
| 1: | 23andMe | Hispanic/Latino | male | Executives | 0 |
| 2: | 23andMe | Hispanic/Latino | male | Managers | 1 |
| 3: | 23andMe | Hispanic/Latino | male | Professionals | 7 |
| 4: | 23andMe | Hispanic/Latino | male | Technicians | 0 |
| 5: | 23andMe | Hispanic/Latino | male | Sales Workers | 0 |
| --- | |||||
| 4121: | Sanmina | Overall Totals | NA | Operatives | 1660 |
| 4122: | Sanmina | Overall Totals | NA | Laborers/Helpers | 4 |
| 4123: | Sanmina | Overall Totals | NA | Service Workers | 57 |
| 4124: | Sanmina | Overall Totals | NA | Totals | 5205 |
| 4125: | Sanmina | Overall Totals | NA | Managers | 591 |
company : the various companies centered around Silicon Valley
25 levels: "23andMe", "Adobe", "Airbnb", "Apple", "Cisco", "eBay", "Facebook", "Google", "HP Inc.", "HPE", "Intel", "Intuit", "LinkedIn", "Lyft", "MobileIron", "NetApp", "Nvidia", "PayPal", "Pinterest", "Salesforce", "Sanmina", "Square", "Twitter", "Uber", "View"race : the race-wise distribution of employees
8 levels: "American-Indian/Alaskan Native", "Asian", "Black/African-American", "Hispanic/Latino", "Native Hawaiian/Pacific Islander", "Overall Totals", "Multiracial", "White/Caucasian"gender : the gender-wise distribution of employees
3 levels: “male”, “female”, NAjob_category : the job type classifications of employees
11 levels: "Administrative Support", "Craft Workers", "Executives", "Laborers/Helpers", "Managers", "Operatives", "Professionals", "Sales Workers", "Service Workers", "Technicians", "Totals"
Notes:
- The data is completely from the year 2016. It would be wise to mention this as the year column was removed during the cleaning of the data.
OBSERVATIONS & CONCLUSIONS:
- Categorical Data
- Statistical Summaries
- Discrete Distributions of Sex Ratio, Non-White Ratio and Blue-Collar Jobs Ratio
- Linear Models to Prove Linear Independence
- Sex Ratio ~ Non-White Ratio
- Sex Ratio ~ Blue-Collar Jobs Ratio
- Non-White Ratio ~ Blue-Collar Jobs Ratio

Figure 1: Company Diversity Pie Chart

Table 1: Company Diversity Table

Figure 2: Gender Diversity Pie Chart

Table 2: Gender Diversity Table

Figure 3: Racial Diversity Pie Chart

Table 3: Racial Diversity Table

Figure 4: Job Diversity Pie Chart

Table 4: Job Diversity Table

Figure 5: Bar Chart for Company Diversity

Figure 6: Bar Chart for Gender Diversity

Figure 7: Bar Chart for Racial Diversity

Figure 8: Bar Chart for Job Role Diversity
Built With
- ggplot
- r
- tidyverse
Log in or sign up for Devpost to join the conversation.