Inspiration
What it does
Inspiration
Students often struggle to explain visual homework problems to AI tools using only text. Many study materials like diagrams, charts, and handwritten notes are difficult to describe accurately.
SnapTutor was inspired by the idea that AI should be able to see what the student sees and provide explanations naturally.
What it does
SnapTutor is a multimodal AI tutor that analyzes images of homework or study material and answers questions about them. Users can upload an image, ask a question, and receive a clear explanation.
The response is also spoken aloud using browser speech synthesis, creating a more interactive learning experience.
How we built it
SnapTutor was built using Next.js with TypeScript for the frontend and API routes. The application sends the uploaded image and the user’s question to Google's Gemini multimodal model.
Gemini analyzes the visual content and generates a contextual explanation, which is displayed in the interface and spoken using the browser Speech Synthesis API.
Challenges we ran into
One challenge was handling multimodal input by properly converting uploaded images into base64 format for the Gemini API. Another challenge was managing model quotas and selecting the correct supported model.
We also focused on designing a responsive interface that keeps the image, question input, and AI response visible at the same time.
Accomplishments that we're proud of
We successfully built a working multimodal AI tutoring interface that allows users to learn directly from visual study material. The system integrates image analysis, natural language interaction, and voice output.
We are proud of creating a clean and responsive interface that demonstrates the power of Gemini for educational use cases.
What we learned
Through this project we learned how multimodal AI models can interpret images and combine them with natural language queries. This allows AI systems to understand context much more effectively than text-only systems.
We also learned how to integrate the Gemini API into a modern Next.js application and design an intuitive user interface for AI interaction.
What's next for SnapTutor
Future improvements include real-time voice interaction using live speech recognition, allowing users to ask questions conversationally. We also plan to support more types of educational content such as math equations, handwritten notes, and diagrams.
Long term, SnapTutor could evolve into a full AI learning assistant that helps students study from any visual material.
How we built it
Challenges we ran into
Accomplishments that we're proud of
What we learned
What's next for SnapTutor
Built With
- next.js

Log in or sign up for Devpost to join the conversation.