CareerGraph

Inspiration

What it does

What Inspired Us We were inspired by a problem that every student and young professional faces: "What should I do next?"

Too often, we're forced to fly blind, guessing which skills, courses, or projects will actually help us land our dream job. We saw countless peers (and ourselves) spending time on skills that didn't pay off, while job descriptions kept demanding new, unknown technologies.

We wanted to replace that career guesswork with a data-driven, intelligent system. The core idea was to build a "GPS for your career" that analyzes thousands of real-world paths from successful alumni to show everyone else the way.

How We Built It This project was a full-stack effort, integrating a graph database, a GNN, an LLM, and a web frontend.

The Foundation (Knowledge Graph): We started with the data. We used Neo4j to build a heterogeneous knowledge graph. This graph consists of nodes (like Person, Skill, Role, Course, Project) and the relationships that connect them (like LEARNED, REQUIRES, HAS_ROLE).

The "Brain" (Graph Neural Network): To find hidden patterns, simple queries aren't enough. We built a Graph Neural Network (GNN) using PyTorch Geometric. This model learns a rich mathematical "embedding" for every single node in the graph, allowing us to find similar people or predict future career paths based on deep-seated patterns, not just direct connections.

The "Mentor" (AI Chatbot): We used Google's Gemini model as the conversational AI. The key was its function calling ability. A user can ask, "What skills am I missing for Data Engineer?", and the LLM translates that into a structured function call, like analyze_skill_gap(target_role="Data Engineer").

The Interface (Web App): We tied it all together with Streamlit. We created a two-tab application:

AI Mentor: A chat interface that talks to the Gemini model and our "fake" (Hollywood demo) functions to provide a flawless user experience.

Graph Explorer: A live tool using streamlit-agraph that connects directly to our Neo4j database, allowing us to run real Cypher queries and visualize the underlying graph data.

Challenges We Faced This project was a masterclass in debugging. Our biggest challenges were almost all related to data and environment mismatches.

The Great Data Mismatch: Our biggest hurdle was realizing our GNN code and our AI mentor were built for a tech-career dataset ("Data Engineer," "Python"), but our Neo4j database was accidentally loaded with a generic career dataset ("Lexicographer," "Sales_executive"). This caused every single one of our "real" queries to fail.

The "Hollywood Demo": Because of the data mismatch, we had to quickly pivot. We "faked" all our AI's functions to return hard-coded, perfect JSON. This taught us how to build a robust-looking demo even when the backend is broken.

The "It Works on My Machine" Problem: Our queries worked perfectly in the Neo4j Browser but returned "no data" in Streamlit. This was a maddening, multi-step debugging process that ended with us realizing we were connected to the wrong database (the default neo4j instead of our populated CareerGraph... or, as we later found, that the data was in neo4j but had no relationships).

Environment Hell: We hit nearly every small config error possible:

Neo4j server not running (Connection refused).

PyTorch's new security policy blocking torch.load (fixed with weights_only=False).

Trying to run Streamlit with python app.py instead of streamlit run app.py.

What We Learned Data Is Everything: A beautiful model and a smart AI are 100% useless if they're pointed at the wrong data. Data integrity and synchronization between your database and your code are the most important part of any AI project.

Debug Layer by Layer: When an app fails, don't blame the code. We learned to ask: Is the server running? Is the connection working? Are the credentials right? Is the database name right? Is the query correct? Is the data in the database?

Full-Stack Integration is the Real Challenge: Connecting a GNN, a graph DB, an LLM, and a web app is incredibly complex. The real skill is managing the "translation" between these systems (PyTorch Tensors, Cypher queries, JSON, and Streamlit state).