McFinder

About the project

Inspiration

Most student networking tools optimize for existing relationships (“who you know”). This project, intended for McGill student networking, flips that axis to intent (“what you’re trying to do”), so collaboration forms around concrete outcomes: passing a course, finding a study group, building a startup, getting tutoring, etc. The graph UI makes those goal-based neighborhoods visible rather than hidden, such as in major networking tools today.

What we built

A monorepo full-stack web app with a React + Vite frontend and a Flask + MongoDB backend.

Frontend (React 19 + Vite + Tailwind): Interactive 2D/3D graph visualization (react-force-graph / Three.js) with node interaction to inspect profiles. Auth gate that supports a Guest Mode bypass so users can explore without registering, and search UI for finding relevant connections and viewing match context.

Backend (Flask 3 + MongoDB Atlas): REST API for auth, profiles, graph data, seeding, and CV parsing. MongoDB persistence for users/connections; seed endpoint to generate a usable graph dataset for demos. Implemented security with password hashing via werkzeug.security.

Developer experience A single run.js script that standardizes setup and local startup: checks Node version (Node 22+ with Volta support), installs missing frontend/backend dependencies, cleans ports (5000/5173), runs Flask + Vite in parallel, supports a --clean option to reset .venv and node_modules.

Matching algorithm

The matching score is designed to be explainable and weight-driven so it can later power “why you matched” UI.

Data model

Works through data on a user by user basis, using two dataclasses

Experience: one internship/job entry (company, industry, dates, location, etc.)
UserProfile: one user’s academic + professional info

Normalization and field matching

For categorical fields like faculty, major, minor, preferred work country, values are normalized (trim + lowercase) and compared as exact matches:

$$ s(x,y)=1\ \text{if } x=y,\quad \text{otherwise }0. $$

This method was used to remove inconsistencies and have more absolute control on how data is treated and handled within the algorithm. Given we are using Extended Gower and our data is mixed type, making it the right tool for precise and complex similarity.

Experience parsing and duration

Each experience attempts to parse start_date and end_date. If explicit duration is missing, it is computed (months) from dates, keeping all experiences in a consistent structure for comparison.

Recency decay (half-life weighting)

Older experiences contribute less than recent ones via exponential decay: $$ w_{\text{recency}}(t)=2^{-t/h} $$ where ($ t $) is time since the experience ended or since start for ongoing roles and ($ h $) is the half-life in months. This makes the score reflect current trajectory rather than lifetime history, while also keeping the half-life relatively long, at 3 years, in order to retain a context window on the order of a college education.

Per-experience similarity / best-match alignment

For each experience in user A, the algorithm finds the best matching experience in user B, a max-over-candidates alignment, using weighted components. Recency is applied as a weight on the experience contribution rather than a similarity component:

company: exact match
industry: exact match or Jaccard overlap if list-like $$ J(A,B)=\frac{|A\cap B|}{|A\cup B|} $$
duration: numeric distance mapped to similarity (closer durations score higher)
country: exact match

Aggregation across internships and jobs

Internships and jobs are compared separately, then combined into a single professional similarity such as mean of the two category scores. This prevents one category from dominating if the other is sparse.

Final weighted Gower-style aggregation

All components are combined into a single score in ($ [0,1] $) via a weighted average: $$ S=\frac{\sum_i w_i, s_i}{\sum_i w_i} $$ where ($ s_i $) includes faculty/major/minor/preference matches and the aggregated professional-experience similarity. The weights are tunable so the product can prioritize academics vs. career alignment depending on the use case, study buddies vs. startup cofounders vs. tutoring.

What we learned

We learned how to design a full-stack system where the frontend experience and backend data model stay tightly coupled instead of drifting apart. In practice, this meant making sure that every UX feature in the graph interface such as exploration, search, profile viewing, and guest access, mapped cleanly onto real backend primitives like user schemas, scoring outputs, and graph endpoints, rather than relying on frontend-only mock logic.

We also learned how to build a similarity algorithm that is both technically sound and explainable. Instead of using black-box heuristics, we structured the matching system around normalization, weighted similarity metrics, and a Gower-style aggregation framework, with professional experience treated as its own structured scoring subsystem. The recency decay mechanism was especially important, since it forced the algorithm to reflect a user’s current trajectory rather than treating all past experiences as equally relevant.

Finally, we learned the importance of developer workflow automation for making a project usable beyond its original authors. The run.js script was not just convenience code: it was a practical solution to dependency drift, broken environments, and inconsistent setup across machines, allowing the entire stack (Flask backend + React frontend) to be reliably launched with one command.

Challenges

One of the hardest parts was full end-to-end integration. Even when the frontend and backend were individually correct, small inconsistencies in authentication state, token handling, request formatting, or CORS configuration could break the user flow entirely. Keeping React state management aligned with Flask API expectations and MongoDB persistence required constant iteration, especially as new endpoints and profile fields were added.

Another major challenge was CV ingestion and profile extraction. Parsing PDFs into clean text is already unreliable, but the real difficulty was converting unstructured resume information into a consistent schema that the scoring algorithm could actually use. The extraction pipeline needed to be robust enough to handle messy formatting while still producing normalized fields like experience timelines, industries, and location metadata.

Graph usability was also an engaging nontrivial problem. Rendering an interactive force-directed network is easy at small scale, but keeping it readable and responsive as node density increases requires careful tuning interaction design and 2D/3D rendering options. The graph had to remain interpretable rather than turning into an unusable ball of nodes.

The algorithm itself introduced constant design tradeoffs. We needed the matching score to be realistic, but also deterministic and explainable. Aligning experiences by “best match” improves accuracy, but it risks overfitting if weights are poorly tuned. Similarly, recency decay improves relevance, but must be calibrated carefully so that older but meaningful experiences do not become irrelevant noise.