Last year all three of us changed our major to Data Science, a new major that represents an up and coming field that everyone is still figuring out. As second years, we felt like we had to learn 25 new skills in a 24 hour day, and we struggled to figure out which ones were most important to spend our time on. So we created a website to do just that!

What it does

Our program scrapes and parses job listings and isolates the skills employers have listed on them. It then processes these skills, takes out "filler" words, and ranks what is left in order of importance by analyzing how often the skill comes up in job listings. Our easy-to-use website lets users put in a skill and see how important it is to different jobs.

How we built it

We wrote a program in Python to parse job listings and used the NLTK package to tokenize the isolated skills. We then used the dictionary data structure to sort and rank the cleaned data for each job. We implemented a user-friendly search engine by integrating the Algolia API, which we then linked to our website built with HTML and CSS.

Challenges we ran into

The first and biggest challenge we ran into was understanding how to deal with the discrepancies between job listings. Since each employer writes their own job listing, the formats vary wildly. Another challenge was integrating the Algolia API and creating the website since we are completely new to HTML and CSS as well as APIs.

Accomplishments that we're proud of

Learning so much! Creating a prototype that works! Working with resources we've never used before! Being awake for such a long time!

What we learned

We learned about things we didn't even know existed before we came here. We learned what an API is and how to integrate one (don't reinvent the wheel!) and we learned how to create a website using HTML and CSS, which we did not have any prior exposure to. Honestly, we could go on forever about what we learned. It was a lot.

What's next for hsl.

So much! We hope to implement this idea on a larger scale and group jobs with similar skills together on our website, so users can search skills and see how important that particular skill is in jobs. We also want to increase the number of listings we scrape and the number of skills we display on our site, as well as further optimize our tokenization process and implement multi-word tokenization.

Share this project: