Nexus

When I started the NYU MS in Data Science, I knew I wanted research experience, but I did not have a sharp way to describe what I was looking for yet. The first semester felt less like a lack of motivation and more like a translation problem. I could point to courses and skills, but I struggled to turn that into a research direction I could explain clearly, or to map it onto faculty who were actually doing the kind of work I meant. The public information is all there in principle: faculty pages, papers, lab sites. In practice it is scattered, unevenly updated, and hard to compare against a paragraph of genuine curiosity. I spent a long time opening tabs, re-reading abstracts, and second-guessing whether I was aiming at the right people. The hardest part was not writing the email. It was choosing who should receive it.

I did secure a research assistant role, but the timeline did not match the story I had imagined when I arrived. The fit came together in my second semester, after I had done the slower work of clarifying interests, narrowing targets, and learning how opportunities actually surface in a department. LabLens is the project I wish had existed during that first semester: a way to move from a plain language description of interests to a smaller set of plausible matches, with evidence drawn from live public sources and a similarity score that can be inspected rather than treated like a black box verdict.

The system is intentionally split between what the web can show and what language models should be allowed to say. We retrieve real material from the open web, structure the student’s intent into something we can compare against research text, embed those texts and measure alignment with standard vector similarity, then generate summaries and next steps only when they can be tied back to what was retrieved. The goal is not to automate judgment or replace mentorship. It is to reduce the time students spend wandering between incompatible options and to make a careful first contact easier to do well.

That is the story behind this repository. The code is a working sketch of that idea: a Next.js front end for the student journey, a FastAPI service for retrieval, verification, scoring, and grounded generation, and a long build specification under prompt/ that records how we wanted the hackathon version to behave. If the project has a single thesis, it is that the bottleneck for many students is early navigation, not talent, and that better tooling at that step can change who even gets to the conversation.