Gemini Sensei

💡 Inspiration Most educational AI tools suffer from the "Answer Trap"—they provide the correct solution but ignore the student's internal logic. We were inspired by Bloom’s 2 Sigma Problem, which proves that students tutored one-on-one perform two standard deviations better than those in a classroom. We realized that to bridge this gap at scale, an AI shouldn't just be a calculator; it needs to be a Sensei—a mentor that watches how you think, listens to your doubts, and corrects your mental trajectory in real-time. 🚀 What it does Gemini Sensei is an AI-powered cognitive tutor that focuses on how learners think, not just what they answer. Cognitive Logic Trace: It deconstructs student work to find the "Critical Deviation"—the exact step where reasoning failed. Visual Heatmaps: Using Gemini’s spatial intelligence, it overlays pulsing heatmaps directly onto the student’s uploaded work to highlight conceptual errors. Live Mentorship: A "Sensei Voice" mode allows students to explain their logic out loud while the AI "sees" their paper through the camera, interrupting gently if logic falters. On-Demand Video Tutorials: If a student is stuck, Gemini Sensei uses Veo 3.1 to generate a custom conceptual animation on the fly. Cognitive Fingerprinting: It tracks "Error Types" across sessions, building a profile of the student's long-term logical habits. 🛠️ How we built it We leveraged a multi-model architecture within the Google GenAI ecosystem: Gemini 3 Pro (Reasoning Engine): Used for deep, multi-modal analysis of student work and complex JSON-structured diagnostic output. Gemini 2.5 Flash (Live API): Powers the real-time voice and vision mentorship, streaming synchronized raw PCM audio and image frames for ultra-low latency. Veo 3.1 (Video Generation): Dynamically generates custom MP4 educational animations based on the specific "Shortcut Method" identified during analysis. Grounding: Integrated Google Search Grounding to verify educational sources and provide links to reputable textbooks and videos. 🚧 Challenges we ran into The biggest technical hurdle was Synchronous Multimodal Streaming. Handling raw PCM audio bytes for the Live API while simultaneously processing 1FPS camera frames required precise buffer management and state handling to ensure the "Sensei" felt human and responsive. Additionally, crafting a JSON Schema that could accurately map normalized bounding boxes (0-1000) from an AI's perspective back to a responsive CSS-based heatmap was a complex coordinate-geometry challenge. 🏆 Accomplishments that we're proud of The "Judge-Safe" UI: We built a production-ready API key management system that allows anyone with a GCP project to use the app immediately without security risks. Zero-Shot Diagnostic Accuracy: The model's ability to not just solve the problem, but successfully locate the human error in a messy, handwritten image. The Veo Integration: Successfully bridging the gap between a text/image diagnostic and a fully generated video tutorial in a single workflow. 📖 What we learned We discovered that "Thinking Budget" is critical for pedagogy. By allowing Gemini 3 Pro a higher thinking budget, the quality of the "Cognitive Insight" improved drastically, moving from surface-level corrections to deep logical interventions. We also learned that in education, latency matters as much as accuracy—the Live API's ability to interrupt a student mid-sentence is a game-changer for behavioral learning. 🔮 What's next for Gemini Sensei LMS Integration: Connecting Sensei directly to Google Classroom so teachers can see "Logic Heatmaps" for their entire class. Collaborative Logic: Peer-to-peer sessions where the Sensei moderates a debate between two students to find the best reasoning path. Pedagogical Fine-Tuning: Training a lightweight version (Gemini Nano) to run locally on student devices for offline, privacy-first tutoring.