AI learning Assistant

Inspiration

The rise of Large Language Models in education has created a "shortcut culture." Most students use AI to simply generate answers, which bypasses the critical thinking process and leads to "copy-paste learning." I wanted to build something that doesn't just give the answer, but acts like a world-class tutor by guiding the students through the struggle so they actually understand what they are doing.

What it does

Socratic Guidance: Unlike ChatGPT, it refuses to give direct answers. Instead, it asks leading questions to help students arrive at the solution themselves.
Context Aware Learning: Students can upload PDFs or Word documents (like lecture notes or textbooks), and the AI will base its tutoring specifically on that material.
Multimodal Vision: Students can snap a photo of a handwritten math problem or a complex biology diagram, and the AI will analyze the image to start a tutoring session.

How we built it

LLM: I chose Google Gemini 2.5 Flash for its massive 1-million-token context window (allowing for - huge textbook uploads) and its multimodal capabilities.
Backend & UI: Streamlit allowed me to build a responsive, functional web interface entirely in Python.
Document Processing: PyPDF and python-docx to extract text from student materials.
Vision: The Pillow (PIL) library handles image processing before passing visual data to Gemini.

Challenges we ran into

The biggest hurdle was State Management. In a Socratic conversation, the AI needs to remember the previous hints it gave without getting confused by the "System Instructions." I had to carefully make the chat history logic to ensure the conversation followed a strict User-Model-User-Model sequence, or the API would return errors. I also spent significant time prompt to ensure AI was fulfilling the needs of the user.

Accomplishments that I am proud of

Successful Multimodal Integration: I managed to get text, images, and documents working in a single unified chat interface.
Speed: Using Gemini 2.5 Flash ensured that even with large PDF uploads, the response time remains near-instant.

What I learned

I learned that the System Instruction is the most powerful tool for shaping AI behavior. I also learned how to handle text extraction and the importance of a clean UI in educational tools to prevent distraction.

What's next for AI learning Assistant

Voice-to-Voice Tutoring: Using Gemini Live capabilities to allow students to "talk through" problems hands-free.
Quiz Generation: A feature that scans uploaded notes and generates a practice exam to test retention.
Analytics Dashboard: A way for students to see which topics they struggled with most based on their conversation history.
Make AI models for specific topics

Built With

ai
geminiapi
python
streamlit

Updates

Shubhansh Sharma started this project — Jan 05, 2026 02:44 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.