Visa‑Smart Job Match

Workflow


Inspiration

LinkedIn job‑hunting felt like Groundhog Day:

  • 30‑minute scroll marathons,
  • constant visa‑eligibility guesswork,
  • manual résumé tweaking for each role.

We wanted a one‑click assistant that scrapes only the latest listings, respects OPT/H‑1B constraints, and ranks jobs by true résumé fit— locally and privately.


What it does

  1. Initial set‑up

    • git clone → npm install → node server.js
    • Upload resume.pdf, visa status, and OpenAI key via a local UI (localhost:3000).
  2. Fully automated session

    1. User opens a “Past 24 hours” LinkedIn search tab.
    2. Extension/Playwright autoscrolls & scrapes job cards + full JDs.
    3. Python service embeds résumé & JDs (text‑embedding‑3‑small), filters visa keywords, and scores cosine similarity.
    4. Top N roles appear in the UI and export to top_jobs.csv.
    5. (Optional) CSV is e‑mailed via local SMTP.

How we built it

Layer Tech Highlights
Front‑end React + Tailwind Minimal form, live progress status
Local server Node / Express Hosts UI & REST endpoint
Scraper Playwright (headless Chromium) Auto‑scroll + throttled requests
Ranking engine Python 3.10 pdfminer.six, openai, numpy, pandas
Data hand‑off JSON contract Shared TypeScript interface for Node ↔ Python
Optional cloud S3 + Lambda Presigned‑URL upload → serverless ranking

Challenges we ran into

  • LinkedIn anti‑bot throttling – solved with randomized delays & session cookies.
  • Visa keyword chaos – built a YAML taxonomy with positive/negative regexes (“sponsorship available”, “GC holder only”, etc.).
  • Large PDF résumés – first three pages only, plus embedding cache to avoid repeat cost.
  • Manifest V3 quirks – needed message‑passing between content script and service‑worker for authenticated fetch.

Accomplishments that we're proud of

  • Local‑first privacy – résumé never leaves the user’s machine unless they opt‑in to S3.
  • 90 % time‑saving – beta users cut job‑screening from ~30 min to < 3 min/day.
  • Plug‑and‑play – no persistent servers; everything spins up on demand.

What we learned

  • Small embeddings are plentytext‑embedding‑3‑small is 5× cheaper and still nails semantic matching for 30 JDs in <2 s.
  • UX matters – a single “Run” button and output CSV drove adoption far more than a fancy dashboard.
  • Scraper resilience – centralizing selectors lets us patch DOM changes in minutes.

What's next for Papa

Chrome Extension → S3 → Lambda pipeline

Share this project:

Updates