Time spent - How many hours did you put into this project? (Can be a rough estimate)
We spent 2 days working on this project.
Inspiration - What is the thinking behind your project?
As students who will be graduating in Spring 2021, we feel that it is hard to search for new graduate roles in LinkedIn, Indeed, and Monster. Most of the search results for a new graduate position will have a mix of new graduate roles and entry level roles that require certain years of experience. As a result, it is difficult to find opportunities that match your skill sets and are strictly for new graduates. We wanted to build a platform that caters towards new graduates especially during Covid-19 where many of us are trying to find opportunities in the field once we graduate.
What it does
It is a job-board platform aimed to assist new graduates in their job hunt. Our platform offers a centralized location for new graduate related job opportunities and filters out entry-level job posts that requires a certain amount of years of industry experience. It allows new graduates to keep track of their applications and compare offers they have received or information of the company.
How we built it
In our original process, we were trying to find a way to scrape data of job postings from Indeed, LinkedIn, and Monster. We found that certain APIs like LinkedIn were not opened for public access, so we decided to go with the alternative by using BeautifulSoup. We specifically decided to use BeautifulSoup due to its ability of obtaining specific data from elements out of the webpage. We paired BeautifulSoup with urllib3 to assist in retrieving web pages off the internet.
Why Natural Language Toolkit?
We used Natural Language Toolkit because our custom model that we were trying to create in Google’s AutoML was taking too long to train. We decided to create our own algorithm to sift through the data we scraped and extract new graduates related job-postings. The algorithm searches the lemmatized descriptions for (pre-lemmatized) keywords that relate to a new graduate position. Our reasoning is that lemmatizing the description and keywords before matching will greatly speed up the efficiency of the algorithm. Rather than having to deal with matching different variations of the words, we match the words by the base meaning.
The data scraped from Indeed, LinkedIn, and Monster are unstructured data. Since Firebase is a NoSQL database, it will be easier to handle the data ingestion. Firebase also has an authentication system which we can expand our functions to in the future. Furthermore, Firebase offers scalably for users and a place to store documents and images. Users will have an option in the future to upload their resume and cover letters on to the platform.
On the front-end, this tool allows multiple users to share files and interact on the outcome of the website’s design. Figma allows us to collaborate in real time and gives us a general approach on the result of our app. We were able to work on and view multiple designs for interactive and pleasing pages and decide on a colorful theme that will fit our overall goal.
Challenges you ran into (From start to end, what problems did you run into?)
The biggest challenge I faced was understanding how to scrape the data from LinkedIn and Monster. I needed to go on each webpage to see how everything was formatted and coded. In addition, figure out where embedded links lead to and how to pull data from those locations too.
Jia Yu Lin
One of the biggest challenges I faced was identifying new graduate job-posts in an accurate and efficient way. There was nothing better suited to doing that than using Google’s AutoML. However, due to the training model time, inevitably, I had to shift gears and figure out an alternative in identifying new graduate job-posts.
As the frontend developer of the project, I had to also find a way to connect the backend with the frontend. Despite having all of the data, I struggled to find out how to retrieve that data from Firebase. To an extent, it was as simple as having the same versions when implementing libraries.
The challenges I ran into as a frontend developer were the deployment of Firebase. I was new to deploying a website on Firebase and I had learned that I needed to create a project in the Firebase website, run a series of command lines, create a directory and deploy Firebase. I would then be able to display the front-end on a working url.
What we learned and accomplishments that we're proud of
This project taught me a lot about scraping data off websites and filtering useful data from it. It was the first time I’ve done a project requiring BeautifulSoup and this project gave me a lot of first hand experience.
Jia Yu Lin
I learned a lot about Google’s AutoML and Natural Language API from how data is trained and analyzed to the basics of implementing it into a project. I’ve also learned how to set up and use Firebase.
I’ve learned how to make a responsive website to complement a mobile-first design. I’ve also learned more about bootstrap v4.5.2, making interactive pages, and deploying our website through Firebase.
What's next? (What are your next steps / how can you extend this project in the future?)
- Improve accuracy of the data models
- Automating the pipeline of web-scraping data
- User Authentication and Google (and other mediums) login
- Adding descriptions to the job requirements to job cards
- Navigate the application
- Saving notes to the database
- Adding a search function to filter job searches and criteria
- Ability to bookmark job listings
- Automating saved job listings to applications page
- Animations and transitions for a more interactive webpage
- Adding images and logos from companies
- Order data by most recent dates
- Dark mode