Project Code

13F2F9451F4C3E5C

Inspiration

We were inspired to build ProfMatch because there’s no centralized way for students to find research opportunities. Most research still comes from cold emailing professors, but the process is fragmented and inefficient. A lot of research just comes from cold emailing professors; however, trying to go through different faculty pages for different colleges, reading through their bios, finding their research/personal website, and finally cold emailing them is tedious and time-consuming.

What it does

ProfMatch addresses the lack of a centralized research discovery platform by embedding both student profiles and professor research information using Gemini, and then using vector similarity search in MongoDB to surface the most relevant matches. A student can upload their resume or search by areas they’re interested in, and ProfMatch automatically surfaces professors whose research aligns most closely with their background and interests. Our platform then provides a detailed overview of each professor, along with a customizable cold email draft that students can copy into their email with a single click

How we built it

We built this using React + Next.js +Typescript for the front end, leveraged Orcid and OpenAlex APIs for scraping UVA/Georgetown Professors, Gemini API for multiple use cases (disambiguating professors, parsing resumes, embeddings, professor summaries, etc.), and MongoDB to store professor/paper information for vector search with embeddings.

Challenges we ran into

One of the biggest challenges we ran into was automating the scraping process. Because all of this information on professors is not in a centralized place, for each school department, you would essentially need to build a custom scraper for each university's website (Which is why we were only able to generalize this to 2 schools' CS departments right now). Furthermore, a lot of the time, professor names were ambigious and there were multiple potential matches for each professor in a research database. Thus, we had to create a pretty robust scraping system, using multiple points of reference to disambiguate professors, such as using both Orcid / OpenAlex APIs and Gemini when needed, to ensure with confidence that the match is correct.

Accomplishments that we're proud of

Neither of us had worked with embeddings or vector search using MongoDB before this project, so we had to learn how to design, research, and integrate a semantic matching system, which we are proud of.

What we learned

We learned a lot about how to best integrate AI tools like Gemini into our projects, and how to use them thoughtfully as part of a larger system rather than relying on them as a black box, such as being very deliberate with prompting. We also learned how critical good system architecture is, and how having a solid plan from the start makes it much easier to iterate and scale.

What's next for ProfMatch

We hope to research how to automate the data scraping process even better so our pipeline can scale to more universities/departments and incorporate even more professors in the matching process.

Built With

Share this project:

Updates