Inspiration

The rise of Large Language Models in education has created a "shortcut culture." Most students use AI to simply generate answers, which bypasses the critical thinking process and leads to "copy-paste learning." I wanted to build something that doesn't just give the answer, but acts like a world-class tutor by guiding the students through the struggle so they actually understand what they are doing.

What it does

  • Socratic Guidance: Unlike ChatGPT, it refuses to give direct answers. Instead, it asks leading questions to help students arrive at the solution themselves.
  • Context Aware Learning: Students can upload PDFs or Word documents (like lecture notes or textbooks), and the AI will base its tutoring specifically on that material.
  • Multimodal Vision: Students can snap a photo of a handwritten math problem or a complex biology diagram, and the AI will analyze the image to start a tutoring session.

How we built it

  • LLM: I chose Google Gemini 2.5 Flash for its massive 1-million-token context window (allowing for - huge textbook uploads) and its multimodal capabilities.
  • Backend & UI: Streamlit allowed me to build a responsive, functional web interface entirely in Python.
  • Document Processing: PyPDF and python-docx to extract text from student materials.
  • Vision: The Pillow (PIL) library handles image processing before passing visual data to Gemini.

Challenges we ran into

The biggest hurdle was State Management. In a Socratic conversation, the AI needs to remember the previous hints it gave without getting confused by the "System Instructions." I had to carefully make the chat history logic to ensure the conversation followed a strict User-Model-User-Model sequence, or the API would return errors. I also spent significant time prompt to ensure AI was fulfilling the needs of the user.

Accomplishments that I am proud of

  • Successful Multimodal Integration: I managed to get text, images, and documents working in a single unified chat interface.
  • Speed: Using Gemini 2.5 Flash ensured that even with large PDF uploads, the response time remains near-instant.

What I learned

I learned that the System Instruction is the most powerful tool for shaping AI behavior. I also learned how to handle text extraction and the importance of a clean UI in educational tools to prevent distraction.

What's next for AI learning Assistant

  • Voice-to-Voice Tutoring: Using Gemini Live capabilities to allow students to "talk through" problems hands-free.
  • Quiz Generation: A feature that scans uploaded notes and generates a practice exam to test retention.
  • Analytics Dashboard: A way for students to see which topics they struggled with most based on their conversation history.
  • Make AI models for specific topics

Built With

Share this project:

Updates