Inspiration

The intersection of pure pedagogy and technology has always fascinated me. Traditional AI tools act like sophisticated calculators—if a student asks to solve the quadratic equation, the AI simply outputs the final answer. But true education requires a Socratic approach. I wanted to build an experience that mirrors a real human teacher sitting across the desk: a tutor that can look at a student's notebook, read their facial expressions, and guide them step-by-step through a problem until they truly understand the underlying logic

What it does

LogicLens is a real-time, multimodal AI mathematics tutor. Powered by a custom persona named "Nova," the application uses the device's camera and microphone to establish a live, bi-directional stream with the user. Students can point their camera at a handwritten math problem on a piece of paper or a whiteboard, and Nova will read it aloud, break the complex equation down into bite-sized pieces, and ask guiding questions. It actively observes visual cues to adapt its emotional tone, offering encouragement when the student gets a step right or slowing down when they look confused.

How I built it

The project is built on a decoupled architecture optimized for real-time streaming: Frontend: Built with React and Tailwind CSS, the UI manages complex media streams using the browser's MediaDevices API and a custom AudioWorklet for Voice Activity Detection (VAD). Backend: A Python-based FastAPI server deployed on Google Cloud Run. AI Engine: The core brain uses the Gemini Live API (gemini-2.5-flash-native-audio-preview) connected via native WebSockets. Because Large Language Models can occasionally hallucinate complex arithmetic, I also engineered a secure function-calling pipeline. When asked to evaluate complex expressions like $\sqrt{8464}$ or $\int_{0}^{5} x^2 dx$, the AI intercepts the conversation, sends the expression to the Python backend to calculate the mathematically perfect answer, and then speaks the correct result aloud to the student

Challenges I ran into

Building a production-ready WebRTC and WebSocket pipeline surfaced several deep-level hardware constraints: The Mobile Hardware Lock: Modern smartphone browsers strictly protect camera modules. Rapidly switching from a front-facing to a rear-facing camera caused an AbortError because the OS locked the lens. I had to engineer a "Nuclear Rebuild" workaround that explicitly tears down the video player, stops the track, and waits for a 500ms hardware flush before initializing the new camera. WebSocket Payload Limits: Switching to a high-resolution 4K rear camera created enormous Base64 image strings (upwards of 3MB) that instantly crashed the FastAPI server due to default WebSocket memory limits. I solved this by implementing a Canvas API interceptor on the frontend to dynamically downscale all outgoing video frames to a maximum width of 640px, shrinking payloads to ~50KB without sacrificing the AI's vision capabilities.

Accomplishments that I'm proud of

I am incredibly proud of successfully bridging the gap between an AI model and actual educational methodology. Getting the AI to stop acting like a search engine and start acting like a patient, conversational teacher—who knows when to stop speaking if the student interrupts—feels like a massive leap forward for Ed-Tech. Additionally, overcoming the strict hardware-level media constraints of mobile browsers to achieve a seamless, crash-free experience is a major technical win.

What I learned

I gained a deep understanding of browser media APIs, the intricacies of the AudioWorklet node for raw PCM audio processing, and how to safely manage React state against asynchronous hardware streams. I also learned that prompting a voice-native AI is entirely different from prompting a text bot; you have to explicitly instruct the model to avoid markdown, bullet points, and complex formatting so that its output sounds natural when spoken aloud.

What's next for LogicLens

The immediate next step is expanding Nova's capabilities beyond mathematics into subjects like Physics and Chemistry, where visual diagram analysis is just as critical as equation solving. I also plan to refine the UI with a persistent, interactive digital whiteboard that both the student and the AI can "draw" on simultaneously, creating a truly shared workspace.

Built With

Share this project:

Updates