Inspiration As a student myself, I've always struggled with expensive tutoring and the frustration of getting stuck on homework late at night with no one to ask. Traditional tutoring costs $50-100/hour and isn't available 24/7. When Google released Gemini 2.5 Flash-Lite in July 2025, I realized this was my chance to build something revolutionary - a tutor that doesn't just chat, but truly sees, hears, and teaches like a real human. What it does OmniTutor is the first AI tutor that breaks the text-box paradigm: 🎤 Live Voice Conversations - Press and hold the mic, ask questions naturally, get instant spoken responses 📷 Visual Problem Solving - Upload photos of homework, and Gemini 2.5 analyzes handwritten math, diagrams, or printed problems 💬 Smart Text Chat - Ask complex questions, get detailed explanations with analogies 🧮 Mathematical Reasoning - Step-by-step solutions for algebra, calculus, physics 🔄 Real-time Communication - WebSocket for instant, conversational flow How we built it Frontend: Pure HTML/CSS/JavaScript (no frameworks for maximum compatibility) WebSocket API for real-time bidirectional communication MediaDevices API for microphone access FileReader API for image uploads Mobile-first responsive design Backend: FastAPI (Python) with WebSocket support Google Gemini 2.5 Flash-Lite (July 2025 release) - single model handles text, voice, and images Pillow for image processing Uvicorn ASGI server Infrastructure: Developed on Replit (runs on Google Cloud infrastructure) Ready for Google Cloud Run deployment All code open-source on GitHub Challenges we ran into Model Name Issues - Initially used wrong Gemini model names, but discovered the correct models/gemini-2.5-flash-lite through API testing WebSocket Connection - Had to implement robust reconnection logic for mobile users Image Processing - Optimizing image size for Gemini API limits Voice Streaming - Handling real-time audio in the browser Billing Setup - Navigating Google Cloud billing for deployment 🏆 Accomplishments that we're proud of ✅ Successfully integrated Gemini 2.5 Flash-Lite (brand new July 2025 model) ✅ Built a fully functional multimodal AI tutor in under 2 weeks ✅ Achieved real-time voice conversations with WebSocket ✅ Created a beautiful, responsive mobile interface that works on any device ✅ Published complete, well-documented open-source code ✅ The app actually helps people learn - tested with real students 📚 What we learned How to work with Gemini 2.5's multimodal capabilities WebSocket programming with FastAPI Mobile-first design principles Prompt engineering for educational responses Google Cloud services and deployment The importance of error handling and reconnection logic How to optimize AI responses for learning, not just answering 🔮 What's next for OmniTutor User accounts to save conversation history Multiple language support for global students Code execution environment for programming help Interactive quizzes to test understanding Progress tracking and learning analytics React Native mobile app for app stores Offline mode with on-device AI Deploy to Cloud Run (once billing is set up)
Built With
- css3
- fastapi
- filereader
- google-cloud
- google-gemini-2.5-flash-lite
- html5
- javascript
- mediadevices-api
- pillow
- python
- replit
- uvicorn
- websocket
Log in or sign up for Devpost to join the conversation.