Inspiration
Large organization rarely, if ever, release exact salary information for specific job positions. This leaves prospective job applicants to rely on self reported data found in sources such as Glassdoor - must this be the de facto source? We found not. When a company, university, or organization files a public record labor condition application on behalf of a potential H1B visa applicant (non-US national seeking specialized employment in the United States) they are required by law to include information regarding salary offers, location, and other conditions of employment . We leverage this information along with company sentiment scores and cost of living indexes to create a rich knowledge source for potential job seekers regardless of US citizenship.
What it does
A front-end map allows for the dynamic visual querying of company job offers stratifying multiple industry sectors. Instantly retrieve real salary offers given by companies such as Google, Facebook, and Uber for specified positions such as Software Developer. Alternatively, view an interactive map of the whole United States or single states with aggregated salary information by industry sector and adjust for factors such as cost of living. Overlay sentiment data when querying by company to retrieve the latest public sentiment information aggregated from Google News streams pipe lined through machine learning models trained for sentiment analysis.
How we built it
A MongoDB instance hosts over 3 million H1B visa applications spanning over multiple economic sectors and stratifying the United States geographically. A WebGL front end interfaces a flask server marshaling and querying the MongoDB instance. Sparse geo-locational cost of living data is stored on a MongoDB cloud instance interfaced through MongoDB stitch to allow querying of nearest-neighbor geo-locational through Mongo Geospatial Queries. Data cleaning and pre-processing was conducted through Apache Spark. Spacy and a pre-trained naive bayes model from TextBlob are utilized to perform sentiment classification on Google News articles filtered by specific companies.
Application data in raw form can be found here: https://www.foreignlaborcert.doleta.gov/performancedata.cfm#dis
Challenges we ran into
Data cleaning is a pain.
Log in or sign up for Devpost to join the conversation.