Inspiration
The rise of Large Language Models in education has created a "shortcut culture." Most students use AI to simply generate answers, which bypasses the critical thinking process and leads to "copy-paste learning." I wanted to build something that doesn't just give the answer, but acts like a world-class tutor by guiding the students through the struggle so they actually understand what they are doing.
What it does
- Socratic Guidance: Unlike ChatGPT, it refuses to give direct answers. Instead, it asks leading questions to help students arrive at the solution themselves.
- Context Aware Learning: Students can upload PDFs or Word documents (like lecture notes or textbooks), and the AI will base its tutoring specifically on that material.
- Multimodal Vision: Students can snap a photo of a handwritten math problem or a complex biology diagram, and the AI will analyze the image to start a tutoring session.
How we built it
- LLM: I chose Google Gemini 2.5 Flash for its massive 1-million-token context window (allowing for - huge textbook uploads) and its multimodal capabilities.
- Backend & UI: Streamlit allowed me to build a responsive, functional web interface entirely in Python.
- Document Processing: PyPDF and python-docx to extract text from student materials.
- Vision: The Pillow (PIL) library handles image processing before passing visual data to Gemini.
Challenges we ran into
The biggest hurdle was State Management. In a Socratic conversation, the AI needs to remember the previous hints it gave without getting confused by the "System Instructions." I had to carefully make the chat history logic to ensure the conversation followed a strict User-Model-User-Model sequence, or the API would return errors. I also spent significant time prompt to ensure AI was fulfilling the needs of the user.
Accomplishments that I am proud of
- Successful Multimodal Integration: I managed to get text, images, and documents working in a single unified chat interface.
- Speed: Using Gemini 2.5 Flash ensured that even with large PDF uploads, the response time remains near-instant.
What I learned
I learned that the System Instruction is the most powerful tool for shaping AI behavior. I also learned how to handle text extraction and the importance of a clean UI in educational tools to prevent distraction.
What's next for AI learning Assistant
- Voice-to-Voice Tutoring: Using Gemini Live capabilities to allow students to "talk through" problems hands-free.
- Quiz Generation: A feature that scans uploaded notes and generates a practice exam to test retention.
- Analytics Dashboard: A way for students to see which topics they struggled with most based on their conversation history.
- Make AI models for specific topics
Built With
- ai
- geminiapi
- python
- streamlit

Log in or sign up for Devpost to join the conversation.