MyAdvisor

About the Project

As UW students, we often found ourselves overwhelmed by the complexity of degree requirements, course prerequisites, and scheduling. The official tools, while helpful, lacked the conversational, personalized guidance that would make academic planning less stressful, and advisors can only handle hundreds of students everyday. We wanted to build an AI-powered advisor that could supplement student's academic careers by answering questions like "What classes should I take next quarter?" or "Do I meet the prerequisites for CSE 373?" in a natural, intuitive way.

What We Learned

This project was our deep dive into Retrieval-Augmented Generation (RAG) and modern LLM applications:

Vector Databases: We implemented ChromaDB to store and efficiently search through thousands of course descriptions and degree requirements using semantic similarity
LLM Integration: We integrated Google's Gemini model to generate contextual, helpful responses based on retrieved information
Full-Stack Development: We built a complete application with FastAPI backend and React frontend, handling real-time chat interactions
Data Engineering: We parsed and structured course catalog data from multiple sources (CSV, JSON) into a queryable format

How We Built It

Backend (Python + FastAPI)

We created a RAG pipeline using ChromaDB for vector storage and semantic search
We integrated Google's Gemini LLM through LangChain for natural language responses
We built RESTful API endpoints for chat, profile management, and course search
We implemented context-aware prompting that considers the user's major, completed courses, and current quarter

Frontend (React + TypeScript)

We designed an intuitive chat interface for students to ask questions
We created components for user profiles, FAQs, and course search
We implemented real-time communication with the backend API

Data Pipeline

We scraped and parsed UW course catalog data across multiple quarters
We structured degree requirement documents for efficient retrieval
We generated embeddings for semantic search using ChromaDB's built-in models

Challenges We Faced

Data Quality: Course catalog data was inconsistent across quarters and campuses. We had to write robust parsers to handle various formats and edge cases.
Context Window Management: Balancing the amount of context sent to the LLM was tricky. Too much information led to slow responses; too little resulted in inaccurate answers. We settled on retrieving the top 5 courses and 3 requirement documents.
Semantic Search Tuning: Getting relevant results from vector search required experimentation with chunk sizes and metadata filtering. Some queries about specific course codes needed exact matching, while others benefited from semantic similarity.
Dependency Hell: BeautifulSoup's lxml parser requirement caused deployment headaches—a reminder to always document dependencies clearly!