Visa‑Smart Job Match

Inspiration
LinkedIn job‑hunting felt like Groundhog Day:
- 30‑minute scroll marathons,
- constant visa‑eligibility guesswork,
- manual résumé tweaking for each role.
We wanted a one‑click assistant that scrapes only the latest listings, respects OPT/H‑1B constraints, and ranks jobs by true résumé fit— locally and privately.
What it does
Initial set‑up
git clone→npm install→node server.js- Upload
resume.pdf, visa status, and OpenAI key via a local UI (localhost:3000).
Fully automated session
- User opens a “Past 24 hours” LinkedIn search tab.
- Extension/Playwright autoscrolls & scrapes job cards + full JDs.
- Python service embeds résumé & JDs (
text‑embedding‑3‑small), filters visa keywords, and scores cosine similarity. - Top N roles appear in the UI and export to
top_jobs.csv. - (Optional) CSV is e‑mailed via local SMTP.
- User opens a “Past 24 hours” LinkedIn search tab.
How we built it
| Layer | Tech | Highlights |
|---|---|---|
| Front‑end | React + Tailwind | Minimal form, live progress status |
| Local server | Node / Express | Hosts UI & REST endpoint |
| Scraper | Playwright (headless Chromium) | Auto‑scroll + throttled requests |
| Ranking engine | Python 3.10 | pdfminer.six, openai, numpy, pandas |
| Data hand‑off | JSON contract | Shared TypeScript interface for Node ↔ Python |
| Optional cloud | S3 + Lambda | Presigned‑URL upload → serverless ranking |
Challenges we ran into
- LinkedIn anti‑bot throttling – solved with randomized delays & session cookies.
- Visa keyword chaos – built a YAML taxonomy with positive/negative regexes (“sponsorship available”, “GC holder only”, etc.).
- Large PDF résumés – first three pages only, plus embedding cache to avoid repeat cost.
- Manifest V3 quirks – needed message‑passing between content script and service‑worker for authenticated fetch.
Accomplishments that we're proud of
- Local‑first privacy – résumé never leaves the user’s machine unless they opt‑in to S3.
- 90 % time‑saving – beta users cut job‑screening from ~30 min to < 3 min/day.
- Plug‑and‑play – no persistent servers; everything spins up on demand.
What we learned
- Small embeddings are plenty –
text‑embedding‑3‑smallis 5× cheaper and still nails semantic matching for 30 JDs in <2 s. - UX matters – a single “Run” button and output CSV drove adoption far more than a fancy dashboard.
- Scraper resilience – centralizing selectors lets us patch DOM changes in minutes.
Log in or sign up for Devpost to join the conversation.