Inspiration

Since school, I struggled a lot with maths. The tutors I had could never explain things in a way that worked for me, and I was always hesitant to ask “silly” questions because I felt embarrassed. I often wished there was something I could ask anything to, without judgment, and actually understand the solution step by step. That experience stayed with me, and it’s exactly why I built VisionaryTutor, so no one else has to go through learning the way I did.

What it does

VisionaryTutor is a real-time AI tutor that helps students understand maths visually and interactively.

Users can: Draw a math problem on a digital whiteboard Show a handwritten problem through their camera Upload an image of a math question directly from their device

The AI analyzes the problem in real time, provides the correct final answer, and can explain the solution step by step in a clear, readable way. The tutor speaks back, listens continuously, and users can talk naturally, mute their microphone, or control audio playback at any time.

How we built it

VisionaryTutor is built entirely in the browser using React and TypeScript, with a custom real-time AI hook that connects directly to Google Gemini Live.

Key technical components include: Real-time audio streaming using the Web Audio API (PCM encoding, buffering, playback control) Live visual input via: Camera frames, Digital whiteboard canvas, Uploaded images rendered to an offscreen canvas Browser APIs such as Canvas, MediaDevices, and AudioContext to handle speech, drawing, camera input, and image uploads Markdown + LaTeX rendering to ensure mathematical explanations are clean, readable, and properly formatted

The entire experience runs client-side, with no backend server required.

Challenges we ran into

One of the biggest challenges was handling real-time audio streaming while keeping AI speech, live transcription, and user interruptions perfectly in sync. Managing latency, buffering, and smooth audio playback required careful control of the Web Audio pipeline.

Another challenge was efficiently capturing and sending visual input especially when supporting multiple sources (camera, whiteboard, and uploaded images) without overwhelming the system or hurting performance on different devices

Accomplishments that we're proud of

We’re proud of building a fully real-time, multimodal tutor that can see, listen, speak, and respond all inside the browser.

Key highlights: Seamless switching between camera, whiteboard, and image upload modes Natural, conversational tutoring without judgment Clear, step-by-step explanations that reduce anxiety around learning No backend infrastructure required

It feels less like using an app and more like learning with a patient, understanding tutor.

What we learned

We learned a lot about: Real-time AI systems and streaming architectures Audio processing and synchronization in the browser Designing learning experiences that make users feel safe asking questions Most importantly, we learned how powerful education can be when students are allowed to learn without fear or embarrassment.

What's next for VisionaryTutor

Next, we want to: Personalize explanations based on how each student learns Expand beyond maths into other subjects Add progress tracking so learners can see their improvement over time Our goal is to make VisionaryTutor a daily learning companion, not just a problem solver.

Built With

Share this project:

Updates