AI Resume Analyzer

Inspiration

The job application process can be overwhelming, especially for fresh graduates or junior professionals who aren’t sure whether their resume is strong enough. Recruiters often spend less than 10 seconds skimming a resume — making it essential to tailor your resume for each role.

I was inspired to build the AI Resume Analyzer after noticing how often job seekers struggle with resume formatting, keyword optimization, and grammar — all of which can lead to lost opportunities.

What we Learned

This project gave us hands-on experience with:

  • Natural Language Processing (NLP)
  • PDF parsing and text extraction
  • Machine learning classification (Logistic Regression, Random Forest)
  • Data visualization using Matplotlib and Seaborn
  • Building and deploying web apps with Streamlit

We also deepened my understanding of:

  • How resumes are structured
  • The importance of keyword matching in applicant tracking systems (ATS)
  • Using grammar-check APIs (like language_tool_python)

How we Built It

The project was built in Python using the following key libraries:

  • PyPDF2 and PyMuPDF for PDF text extraction
  • nltk for tokenization and text processing
  • language_tool_python for grammar checking
  • scikit-learn for resume classification
  • matplotlib and seaborn for visual feedback
  • streamlit for the web interface

Workflow Overview

  1. Resume Upload: The user uploads a .pdf resume via Streamlit.
  2. Text Extraction: We extract clean text from the resume using PyMuPDF.
  3. Skill Matching: The app compares user-inputted required skills against the resume content.
  4. Grammar Analysis: We use LanguageTool to identify common grammar issues.
  5. Structure & Keywords: We check for the presence of key resume sections like Education, Experience, Projects, and common action verbs.
  6. Machine Learning Classification:
    • Used RandomForestClassifier (after testing with Logistic Regression).
    • Labeled resumes into categories like Software Developer, Data Scientist, Business Analyst, etc.
    • Vectorized resume content using TfidfVectorizer.

Sample ML Classification Equation

Let:

  • ( x ) be the vectorized resume text
  • ( y \in {0, 1, 2} ) represent resume classes

Then:

[ \hat{y} = \text{argmax}_c \; P(y = c \mid x) \quad \text{using Random Forest} ]

Challenges Faced

1. Inconsistent Resume Formatting

Parsing resume content was tricky due to varying structures. Some resumes used tables, others used unusual fonts, which complicated text extraction.

2. PDF Extraction Accuracy

Initially used PyPDF2 but found it unreliable for some multi-column resumes. Switched to PyMuPDF (fitz) which worked more consistently.

3. Model Performance

Training the resume classifier was challenging due to limited labeled data. I manually collected and labeled over 150+ resumes for three categories.

4. Grammar Detection Speed

Grammar checks were slow on large resumes. I had to optimize by reducing token windows and summarizing checks to key sentences.

5. Deployment Hurdles

Hosting on Streamlit Cloud required Python 3.10 compatibility. Some packages (e.g., matplotlib, language_tool_python) had version conflicts, which I resolved by manually tweaking requirements.txt.


Final Outcome

The AI Resume Analyzer provides:

  • Skill match percentage
  • Grammar suggestions
  • Action verb usage analysis
  • Resume structure feedback
  • Predicted job category (via ML)
  • Suggestions for improvements

This tool is helpful for students, professionals, and even career coaches who want quick feedback on resumes.


Built With

Share this project:

Updates