Inspiration
CIMB, a leading bank in Malaysia, presented this problem in Human Resource. Currently, the Group receives a huge amount of resumes everyday and the needs of hiring right talent is crucial. The process of manually screening through the resumes are tedious and substantial time is wasted when some resumes received is not relevant to the position. The challenge is to expedite the company's screening process by use of automation to screen through a huge number of resumes.
What it does
The program will take in a Job description document (PDF), screen through a repository of resumes (also in PDF), and extract the top 10 resumes that are most relevant to the Job description.
How we built it
The entire solution is built in Python using Azure Notebook in Azure Machine Learning Studio. Detailed coding can be found in attached Notebook hyperlink in this project submission.
Challenges we ran into
Deciding what tools to use
Even within Azure itself, there are a myriad of tools we can use to built the solutions on. Examples are Azure Functions, Azure Machine learning, Azure web app, Azure Logic Apps, Azure Cognitive Services etc. While it's good to have so many choices, it is also confusing. Due to the lack of time and knowledge, I made a quick decision to go with Azure Machine Learning since the Azure Notebook closely resembles Jupyter Notebook, a tool I am familiar with.
Inconsistent format for documents
The provided PDF documents have varying formats. This posed a challenge when attempting to parse the documents into columns with meaningful data. Some sections (in the resume) appeared in different sequence or may be missing. PDF with fancier formats, such as logo, multi-column tables are generally more difficult to parse.
Learning curve for Azure family of products
I had limited knowledge of Azure's products before I attempted this hackathon. As a result, I need to quickly acquire knowledge of each Azure resources, what are their use cases, costs before I can decide which resource to use for this solution. I also took some time to get used to Azure Machine Learning Studio's interfaces, even though I am familiar with Jupyter Notebook and Python.
Costs concerns
I am concerned about the running costs as most Azure resources are charged-per-use. I had to try different workarounds so as to make my credits last long enough for the hackathon's assessments. I also have difficulty estimating the possible costs my solution will incur. However, I am pleasantly surprised that the costs of hosting my solution in Azure is minimal (less than USD 1 per day).
Accomplishments that we're proud of
- Building a solution that will greatly improve productivity in HR recruitment
- Able to built the entire solution in Azure
What we learned
- What are the use cases of the Azure products
- How to estimate the costs of Azure resources, more accurately
What's next for Resume Recommendation Engine
- Improve it to the extend that it is adaptable to the format of most Job description and Resume
- Productize it using Azure Web App


Log in or sign up for Devpost to join the conversation.