Mindo: The Socratic AI Tutor
Inspiration
The inspiration for Mindo comes from a simple but powerful observation: children today are growing up in an era of "instant answers." With AI tools readily available, the risk is that we stop teaching children how to think and start teaching them how to ask for the result.
I wanted to build a digital mentor that respects the curiosity of children aged 4 to 12. Instead of just giving an answer, Mindo uses the Socratic method to guide them through a journey of discovery, turning every question into a lesson in logic and reasoning, supported by a friendly, non-humanoid robot persona called "Grid-Bot".
What it does
Mindo is a specialized multimodal AI companion designed to transform the way children interact with technology. Instead of serving as a search engine that provides static facts, it acts as a real-time Socratic guide.
Key Features:
Active Socratic Reasoning: When a child asks a question or shows a problem on camera, Mindo doesn't just give the solution. It asks guiding questions, provides age-appropriate analogies, and scaffolds the learning process. Multimodal Live Interaction: Powered by the Gemini Live API, Mindo "sees" through the camera at 1 FPS to identify homework and "hears" natural voice questions, responding with affective, low-latency audio dialogue. Interactive Living Whiteboard: A digital canvas where children can draw or write. Mindo reacts visually to the conversation, projecting giant emojis or reading text directly onto the board to keep the student engaged. Intelligent Parental Insights: Parents receive structured reports—generated by Gemini 3.1 Pro analyzing Firebase & BigQuery logs—detailing the child's emotional tone, strengths, and areas needing support. Safe & Age-Appropriate: All interactions are processed through Google Cloud Model Armor, ensuring strict content filtering to protect minors from inappropriate topics.
How we built it
Mindo is built on a modern, robust, and highly scalable Google Cloud ecosystem:
Real-time AI Brain: We utilized the Gemini Live API (gemini-2.5-flash-native-audio) to achieve low-latency voice and vision interactions. Analytical Engine: We integrated Gemini 3.1 Pro to perform "Deep Thinking" on session logs to synthesize pedagogic reports for parents. Frontend Ecosystem: Developed with React (Vite), Tailwind CSS, and Framer Motion to create a highly responsive, animated "Grid-Bot" face and an interactive whiteboard that runs smoothly in the browser. Backend & Streaming: A Node.js WebSocket Proxy deployed on Google Cloud Run to securely bridge the browser's raw PCM audio and video frames directly to the Gemini Live API. Data & Identity: Firebase Authentication handles secure Google Sign-In for parents, while Cloud Firestore manages hierarchical user profiles and session history. Logs are synced to BigQuery for long-term analytics.
Challenges we ran into
Building a tutor that refuses to give direct answers while processing real-time audio and video is surprisingly complex.
The "Answer Leak" Problem Standard Large Language Models are trained to be as helpful as possible, which usually means giving the answer immediately. We had to perform extensive Prompt Engineering within the Gemini Live setup to "re-train" the AI's behavior, ensuring it stays in character and offers foundational hints even when a child says "just tell me!".
Real-time Audio and Video Streaming Transmitting raw audio and video from a web browser to an AI model with minimal latency was a major hurdle. We had to build a custom Node.js WebSocket proxy on Cloud Run, manually process PCM audio chunks via the browser's AudioContext, and send 1-FPS JPEG frames of the combined webcam and whiteboard canvas to the Gemini Live API.
Translating AI Output into UI Actions We wanted Mindo to be expressive. We configured the Gemini Live API with explicitly defined Tools (Function Calling). When the AI wants to "smile" or "draw a star", it triggers a tool call. Parsing these asynchronous tool calls mid-audio-stream to instantly update the React UI (like changing the Grid-Bot's pixel eyes or projecting an emoji) required careful React state management.
Ensuring Child Safety Handling live, unpredictable conversations with minors requires absolute safety. Integrating Google Cloud Model Armor as a middleman allowed us to confidently sanitize prompts and block inherently harmful topics (violence, explicit content) before the AI even formulates a response.
Accomplishments that we're proud of
Mastering the Socratic Persona Seeing the Gemini Live API consistently pivot from a direct question to an insightful visual or verbal hint—without ever breaking the "no-answer" rule—was a major milestone.
A "Living" Interface We are incredibly proud of the "Grid-Bot" UI. By tying the audio output levels to the CSS animations of the robot's "mouth" and hooking its "eyes" to the AI's emotional tool calls, we created a digital entity that feels genuinely alive and empathetic.
Closing the Loop for Parents Converting raw, scattered session logs (durations, message counts, safety alerts) into beautifully written, actionable insights for parents using Gemini 3.1 Pro proves that AI can be an ally not just in teaching, but in parenting.
What we learned
Audio-Native Prompting is Different: We learned that prompting for a voice-native model requires a different approach than text. We had to explicitly instruct Mindo to "keep responses concise" and "use an enthusiastic tone," as long-winded text responses translate to unnaturally long spoken monologues. The Power of Function Calling in UX: We discovered that AI tools aren't just for fetching data; they are incredible mechanisms for UI control. Letting the AI decide when to clear the whiteboard or show a specific graphic makes the application feel magically intelligent. Safety by Design: Integrating Model Armor taught us the importance of proactive, infrastructure-level safety guardrails when building EdTech, rather than relying solely on system prompts.
What's next for Mindo
Mindo is just the beginning of a revolution in AI-assisted education. Our roadmap includes:
Collaborative Socratic Circles (Peer Mode) We aim to introduce a mode where two children can solve a problem together on the same interactive whiteboard, guided and moderated by Mindo. School Curriculum Integration We plan to hook Mindo into standard educational APIs (like Google Classroom) so it can automatically securely read the child's actual weekly assignments and tailor its Socratic exercises to match their school syllabus. Gamification and Rewards Implementing a system where children earn digital badges or unlock new "skins" and colors for their Grid-Bot based on the reasoning paths they successfully complete, boosting long-term retention and engagement.
Built With
- antigravity
Log in or sign up for Devpost to join the conversation.