-
A data science win that can no longer be announced as a winner due to teammate dropping and lack of submissions overall — how sad!
-
The reality — que sera sera. Still proud to represent as a first generation college student who is Asian and female!
-
pivot table analysis - this one offered insights into pay by industry
-
pivot table analysis - this one offered insights into pay state and job title
-
bar graph on location where data scientist jobs are located within scraped dataset
-
word cloud on data scientist job descriptions for scraped dataset
Inspiration
Data scientist is one of the hottest jobs in the 21st century, and the demand for data scientists will only grow, as organizations increasingly rely on data-driven insights. For college students interested in becoming one, one of the first things we do is look up what roles are open and where on a site like Glassdoor. And for the majority graduating with student debt, we care about how much we're paid.
Many factors influence salary packages of data scientists. If money is a deciding factor for what languages and tools to learn and where see ourselves after college, then are there insights we can gain using data scraped from Glassdoor? For example, can we predict data scientist salaries based on different factors like industry sector, location and more?
What it does
I provide a Jupyter notebook that explores data using histograms, box plots, pivot tables, and a word cloud. I also built a model for predicting data scientist salary based on different factors like sector, location, and more.
How we built it
I built this using python and various data science libraries and also used Jupyter notebook.
Challenges we ran into
There was a lot to learn and digest to be able to get to this point! While python is very accessible as a programming language, manipulating data, figuring out what the outliers are and whether they need to be addressed for better analysis or clarity (e.g. Los Angeles was noted as a state!) was what was difficult. That's probably why I hear data cleaning is the non-glamorous portion of any data scientist's job - and for some, it's most of their job. Computing power also hampered progress as it took a lot of time to do any sort of analysis.
Accomplishments that we're proud of
Doing the data science piece on my own (since usually I delegate anything related to machine learning and data science to others more experienced than I am)!
What we learned
A whole lot about data science - how to clean data, how to visualize data in various ways, how to build models, and the various algorithms as part of our toolbox!
What's next for So you want to be a data scientist?
Create a front end interface so that anyone can go to a website, enter various factors, and predict how much salary one would make as a data scientist with those factors. It'd be great to create a dashboard to showcase interesting insights. I'd also want to create a more robust dataset by scraping information from other sites like salary.com and LinkedIn.
Built With
- jupyter-notebook
- python
Log in or sign up for Devpost to join the conversation.