Inspiration
Since school, I struggled a lot with maths. The tutors I had could never explain things in a way that worked for me, and I was always hesitant to ask “silly” questions because I felt embarrassed. I often wished there was something I could ask anything to, without judgment, and actually understand the solution step by step. That experience stayed with me, and it’s exactly why I built VisionaryTutor, so no one else has to go through learning the way I did.
What it does
VisionaryTutor is a real-time AI tutor that helps students understand maths visually and interactively.
Users can: Draw a math problem on a digital whiteboard Show a handwritten problem through their camera Upload an image of a math question directly from their device
The AI analyzes the problem in real time, provides the correct final answer, and can explain the solution step by step in a clear, readable way. The tutor speaks back, listens continuously, and users can talk naturally, mute their microphone, or control audio playback at any time.
How we built it
VisionaryTutor is built entirely in the browser using React and TypeScript, with a custom real-time AI hook that connects directly to Google Gemini Live.
Key technical components include: Real-time audio streaming using the Web Audio API (PCM encoding, buffering, playback control) Live visual input via: Camera frames, Digital whiteboard canvas, Uploaded images rendered to an offscreen canvas Browser APIs such as Canvas, MediaDevices, and AudioContext to handle speech, drawing, camera input, and image uploads Markdown + LaTeX rendering to ensure mathematical explanations are clean, readable, and properly formatted
The entire experience runs client-side, with no backend server required.
Challenges we ran into
One of the biggest challenges was handling real-time audio streaming while keeping AI speech, live transcription, and user interruptions perfectly in sync. Managing latency, buffering, and smooth audio playback required careful control of the Web Audio pipeline.
Another challenge was efficiently capturing and sending visual input especially when supporting multiple sources (camera, whiteboard, and uploaded images) without overwhelming the system or hurting performance on different devices
Accomplishments that we're proud of
We’re proud of building a fully real-time, multimodal tutor that can see, listen, speak, and respond all inside the browser.
Key highlights: Seamless switching between camera, whiteboard, and image upload modes Natural, conversational tutoring without judgment Clear, step-by-step explanations that reduce anxiety around learning No backend infrastructure required
It feels less like using an app and more like learning with a patient, understanding tutor.
What we learned
We learned a lot about: Real-time AI systems and streaming architectures Audio processing and synchronization in the browser Designing learning experiences that make users feel safe asking questions Most importantly, we learned how powerful education can be when students are allowed to learn without fear or embarrassment.
What's next for VisionaryTutor
Next, we want to: Personalize explanations based on how each student learns Expand beyond maths into other subjects Add progress tracking so learners can see their improvement over time Our goal is to make VisionaryTutor a daily learning companion, not just a problem solver.
Built With
- ai-studio
- canva
- gemini
- google-genai
- node.js
- react
- typescript
Log in or sign up for Devpost to join the conversation.