Inspiration

Our inspiration came from the chaos of spring course registration at Duke University. Students were posting on multiple platforms, frantically seeking advice from seniors, sending countless emails to professors, and struggling with DukeHub’s advanced search. Although DukeHub’s system allows keyword searches, it often fails to retrieve relevant courses without specific codes, which are difficult for most students to know. Even finding relevant LLM courses this spring was a challenge—two were offered, one by AIPI and another by the ECE department, yet they didn’t appear in typical searches. This experience highlighted a need for a streamlined, intelligent, and goal-driven course recommendation tool that understands a student’s aspirations and makes course selection intuitive and relevant.

What it does

Our project is an AI-powered course recommendation and scheduling tool that enables students to seamlessly align their academic plans with career aspirations. Leveraging few-shot learning with GPT-3.5 Turbo and domain-specific SciBERT embeddings, our system conducts advanced, context-aware course selection, intelligently analyzing each student's goals, academic background, and current coursework. This sophisticated approach provides tailored course recommendations that maximize relevance and learning outcomes by combining GPT-3.5 Turbo's capabilities to understand complex queries with a two-stage ranking model (bi-encoder for initial matching and cross-encoder for refined re-ranking). This ensures that recommendations are not only aligned with students’ interests but also enriched with precise, contextual insights. The platform offers unique, high-precision results that standard course catalogs and keyword searches lack, effectively addressing the complexities of finding relevant courses across departments, majors, and levels. Each recommendation is displayed on an intuitive calendar view, allowing students to visualize their schedule, sync it with Google Calendar, or export it as an Excel file for comprehensive planning. By helping students make informed, strategic course choices, institutions can also enhance enrollment rates and course fill efficiency. The advanced AI behind put project introduces monetizable opportunities to license or integrate with educational technology providers, creating a scalable, revenue-generating path that appeals to academic institutions and individual students seeking targeted learning experiences.

How we built it

Our approach integrates zero-shot learning techniques, domain-specific embeddings, a dual-stage ranking model, and LangChain parsing capabilities to deliver high-quality, context-aware course recommendations.

  1. Data Input and Preprocessing User Input: Students provide a transcript PDF, specifying their goals, department, major, and semester for course recommendations. This information is parsed, and the relevant text content is extracted from the PDF.
  2. Course Data Collection and Storage Integration with Duke APIs: We use Duke University’s APIs (Duke Dev Kit APIs) to gather up-to-date course information, including course titles, descriptions, departments, instructors, and prerequisites. This data is stored in a structured SQLite database, making it easily accessible for querying and recommendation generation. SQL Query for Relevant Courses: Based on the user’s specified department, major, and semester, an SQL query fetches courses relevant to their area of study while excluding any courses the student has already taken, as determined from their transcript.
  3. Two-Stage Ranking Model Our recommendation system uses a sophisticated two-stage ranking process to ensure both efficiency and contextual accuracy. Here’s a closer look at each stage: Stage 1: Initial Candidate Selection with Bi-Encoder (SciBERT) Domain-Specific Embeddings with SciBERT: We utilize SciBERT (gsarti/scibert-nli), a model fine-tuned for natural language inference, to generate domain-specific embeddings for both the user query and course descriptions. SciBERT is particularly effective here because it is designed to handle technical and academic language, enhancing its understanding of educational terminology and specific course content. Semantic Similarity Calculation: SciBERT encodes both the user’s query and each course description into a shared semantic space, allowing us to calculate cosine similarity between the embeddings. This process provides an initial set of candidates that broadly match the user’s academic and career goals. Initial Candidate Pool: Based on the similarity scores, we select the top 100 courses as initial candidates, ensuring that only relevant options proceed to the next, more computationally intensive stage. Stage 2: Deep Relevance Assessment with Cross-Encoder Cross-Encoder Model for Contextual Matching: We use a cross-encoder model (cross-encoder/ms-marco-MiniLM-L-6-v2) to re-rank the top 100 candidates based on a deeper relevance assessment. Unlike the bi-encoder, the cross-encoder processes the user query and each course description jointly, allowing it to capture complex contextual relationships and nuances in meaning. Contextual Understanding and Nuanced Matching: The cross-encoder goes beyond basic semantic similarity by considering phrase-level interactions and subtle variations that indicate a stronger alignment with the student’s goals. Final Ranking: The cross-encoder re-ranks the initial candidate pool to provide the top recommendations, which are not only topically relevant but also deeply aligned with the user’s specific objectives and requirements.
  4. Zero-Shot Learning with GPT-3.5 Turbo for Enhanced Recommendation Refinement Few-Shot Prompting with LangChain: We utilize GPT-3.5 Turbo with a prompt that includes a few-shot example setup. By providing examples of user queries and ideal course recommendations, the model receives contextual hints on how to interpret specific academic goals. This prompt-guided refinement helps the model select courses more aligned with the user’s future career aspirations, even without additional training data. Zero-Shot Adaptability: GPT-3.5 Turbo operates in a zero-shot context, where it generalizes user queries to course descriptions without any fine-tuning on specific educational data. This ensures the model can handle diverse queries and offer relevant recommendations even for unique or less common academic goals. Enhanced Contextual Matching: Using GPT-3.5 Turbo alongside the embedding-based stages allows us to refine recommendations with a deeper understanding of user goals, yielding a final course list that’s both technically relevant and career-focused.
  5. Recommendation Output and User Interface Dataframe Output with Course Details: The final output is a dataframe containing detailed course information, including course ID, catalog number, title, description, and relevance score. This dataframe lists the top 5 most relevant courses based on the refined ranking. Interactive Calendar View: The recommended courses are then displayed in a calendar-like interface, showing specific class times and days. This view provides students with a visual schedule, making it easy to see potential time conflicts or gaps in their academic plan. Google Calendar and Excel Integration: To enhance usability, we offer students the option to download their schedule as a Google Calendar file (for syncing with personal calendars) or as an Excel sheet for personal records.

Challenges we ran into

One of the primary challenges was data extraction from Duke’s APIs. The course data was scattered across multiple API endpoints, each with nested JSON structures and inconsistent formats. Combining the data required a careful mapping process, as the various APIs did not use a common key for straightforward merging. This made it difficult to align data fields such as course descriptions, titles, departments, and scheduling information. To address this, we had to design a custom data integration layer that could parse, clean, and merge these diverse data sources into a cohesive format. This involved creating lookup tables and matching algorithms to combine data sets, ensuring that key information for each course was accurately represented in our SQLite database. The lack of a standardized, unified structure across the API endpoints added significant complexity and time to the development process, as we needed to reconcile inconsistencies manually. This challenge highlighted the importance of efficient data handling and integration for robust course recommendation systems.

Accomplishments that we're proud of

One of our proudest achievements is bringing together a diverse set of technologies and data sources to build something genuinely useful for ourselves and our peers. By connecting all the individual components—advanced AI models, Duke’s complex API data, and seamless calendar integration—we created a powerful course recommendation system that we’ll actually use. Even if we don’t win, we’re excited to have built a tool that directly addresses the ongoing challenges of course selection for the upcoming spring semester. We’re thrilled to have developed a reliable, intelligent recommender that will help us and our fellow students make informed, strategic decisions about our courses.

What we learned

This project taught us a lot about the complexities and nuances of building a real-world AI-driven recommendation system. From integrating multiple data sources to handling inconsistencies in nested JSON structures and managing complex API calls, we gained valuable experience in data engineering and API management. Working with cutting-edge AI models like SciBERT and GPT-3.5 Turbo helped us deepen our understanding of NLP, embeddings, and two-stage ranking processes, which significantly enhanced our ability to create contextually relevant recommendations. We also learned the importance of designing user-centric solutions. Building a tool that aligns with our academic needs and integrates seamlessly with platforms like Google Calendar made us more aware of how usability and convenience factor into creating impactful software. This experience reinforced the importance of combining technical sophistication with practical value, leading to a solution that not only functions well but also makes a meaningful difference in our academic planning.

What's next for CourseCompass

To make CourseCompass even more robust, we’re planning to integrate student feedback and sentiment analysis from external sources like RateMyProfessor. This addition would provide insights into course difficulty, professor teaching styles, and overall student satisfaction, helping users make more informed decisions based on real student experiences. This feature will also benefit students interested in taking courses with professors they aim to research with, giving them context about teaching styles and course expectations. For enhanced commercial appeal, here are additional features we could incorporate: Skill-Based Recommendations: Incorporate skill-based filtering where students can search for courses that develop specific technical skills, like data visualization, machine learning, or programming in Python. This would appeal to students focused on acquiring job-ready skills and attract institutions interested in improving career-readiness. Career Pathway Recommendations: Expand beyond course suggestions to offer career pathway guidance by recommending sequences of courses that align with certain professions or fields, like data science, AI research, or biomedical engineering. This could appeal to both students and institutions looking to support targeted career paths. Professor Research Collaboration Insights: Add data about professors’ research interests and open research positions. This would be valuable for students interested in specific research areas, helping them choose courses that align with potential research opportunities. Personalized Notifications and Alerts: Implement notifications about course registration deadlines, new courses added, or even open seats in high-demand classes. This feature would make the platform a go-to tool for students throughout the semester, increasing engagement and retention. Analytics Dashboard for Institutions: Provide universities with insights on student course preferences, academic goals, and popular career paths. This data could support institutional planning, course offerings, and advising services, offering a strong incentive for institutions to adopt CourseCompass. Expanding these features would not only increase the platform’s commercial viability but also make it an invaluable tool for academic planning, career alignment, and university advising.

Built With

  • cross-encoder
  • dukeapi-webdevelopertoolkit
  • gpt-3.5
  • langchain
  • pandas
  • pydantic
  • python
  • scibert
  • sentence-transformers
  • sqlite
  • streamlit
  • turbo
Share this project:

Updates