StudyBean


Often, students struggle with time management and organization. With a full schedule it is even tougher to keep track of where everything is.

Come exam season, students waste time sifting and searching through their classes pages when they could be getting straight to studying.

Our solution was to create a program that will :

  • gather all of the class information / documents to generate course notes / summaries / study guides for exam times into one place
  • reference where to find relevent information
  • be a helpful and cheerful guide through your exam prep

Thus, StudyBean was born. Your personal AI librarian.


Main Functionality

Our app lets a student:

  1. Download course materials from Canvas

    • Uses the Canvas LMS API to automatically download all of:
      • Module files
      • Files linked inside Canvas pages
      • Files and external resources linked from the Syllabus tab
  2. Loads all text from downloaded files as AI context

    • A user may ask a question like: > “What are the topics covered on the midterm for this class?”
    • Then, using only the given course context, the bot will :
      • Answer directly
      • Cite the filename(s) it used
      • Admit “I don’t know” if the answer isn’t present in the materials
  3. *All with: *

    • a custom theme and tweaked UI

    - Fully integrated course content and syllabus/schedule info per course

    Implementation

Backend / Data pipeline

  • Python + canvasapi to talk to the Canvas REST API
  • A downloader that:
    • Iterates over course modules
    • Handles both direct File items and Page items
    • Parses page HTML to extract Canvas file links and external file links
    • Adds a separate pass for syllabus_body so files in the Syllabus tab are not missed

UI

  • Streamlit frontend
  • Sidebar controls for:
    • System prompt choice: choose between neutral or cheerful tone
    • Class selector
    • “Force rebuild context” button that clears caches and rebuilds from disk

LLM/AI integration

  • We tested multiple language models and settled on using gemini 2.5 flash paired with langchain
  • We construct a one holistic prompt containing prompt:
    • serialized class context
    • internal assistant prompt
    • regular user/assistant message history
  • The model is explicitly told to:
    • Only answer from the provided context
    • Cite filenames where possible
    • Say it doesn’t know if the answer can’t be found

Development Challenges

Some challenges we ran into while developing :

  • **Canvas’ file structure

    • Files are found in multiple directories, and there is not always consistency across courses. The kind of file and location can be spread out among:
    • Modules
    • Pages inside modules
    • The Syllabus, which is linked to a course directly rather than a separate page, and requires a unique query compared to other canvas files.
    • Assignment pages
    • Links to external sites, embedded video files
  • Syllabus edge cases

    • Some classes simply use the syllabus page to upload a pdf of the syllabus and class schedule
    • Others contain .html files, or otherwise externally host their syllabus/schedule.
    • We had to:
    • Allow for html support
    • Accommodate downloading external URLs when no ID exist
  • Context size

    • We made the decision to swap models mid-development when we realized our original chosen model did not have the context space for dozens of files across multiple courses.
    • Too further mitigate context limits, we converted each file into text before feeding it to the model

Built With

Share this project:

Updates