Inspiration

I always see my kids watch educational videos passively, they absorb facts but never truly engage. But this isn't just my kids. 1.5 billion children worldwide lack access to personalized tutoring, and traditional educational apps deliver the same static videos, same worksheets, same pace for every learner.

I wanted to build something where learning starts with their curiosity, not a curriculum. Something voice-first, so even a 5-year-old who can't read properly yet can use it independently. What if a child could just *speak* a question and get a complete, personalized learning experience, not just an answer, but also fun game modes such as solving an investigation, making a prediction on a hypothetical scenario, and creating a customized story? And what if every topic they explored became a star in a living constellation map, showing how ideas connect, where gaps exist, and what to explore next?

What it does

ShowMe is a fully AI-native voice-first learning platform. There is zero static content, every slide, diagram, narration, mystery, story, and quiz is generated on-demand by Gemini 3. A child speaks a question like "What is Gemini 3?" and within 30 seconds, Gemini generates a narrated visual slideshow with AI-illustrated diagrams.

But the real learning happens next. Three independent, pluggable game modes turn passive content into active exploration:

  • Mystery Lab: Gemini generates a detective case from the lesson. Kids scan crime scenes for clues, interview AI witnesses, rebuild timelines, and draft arrest warrants, all grounded in the lesson's concepts.
  • Wonder Lab: Kids predict "what if" outcomes, then Gemini generates illustrated consequence reveals with narration to show what would actually happen.
  • Story Studio: Kids choose branching story paths while Gemini illustrates each chapter in real-time.

Each mode is architecturally independent, it takes lesson content as input and uses Gemini to generate a completely different interactive experience. This means new AI-native modes can be added without touching existing ones. The platform is designed to scale with Gemini's capabilities.

Every topic becomes a star in their Knowledge Constellation, a living, AI-powered knowledge graph where Gemini discovers relationships between topics, identifies gaps, and suggests what to learn next. A full gamification engine (XP, streaks, and trophies) keeps kids coming back.

How we built it

The entire platform runs on five Gemini models working in parallel:

  • gemini-3-flash-preview handles speech-to-text, script generation, mystery/what-if/story content generation, and Socratic Q&A
  • gemini-3-pro-image (Nano Banana Pro) renders educational diagrams, crime scenes, consequence reveals, and story illustrations
  • gemini-2.5-pro-preview-tts narrates every slide and learn mode interaction
  • gemini-2.5-flash-lite powers fast classification, knowledge graph operations, and topic clustering

The generation pipeline runs STT → script → (images || TTS) in parallel to hit the 30-second target. The frontend is React 18 + Vite + Tailwind with WebSocket for real-time progress. The backend is stateless Node.js + Express. Each learn mode uses its own state machine (8-state for Mystery, 6-state for Wonder Lab, 11-state for Story Studio) and is fully self-contained, making the platform extensible to new AI-native learning modes. The entire app supports English and Simplified Chinese.

Challenges we ran into

  • Gemini 3 Pro Image consistency: Educational diagrams needed to be accurate, not just visually appealing. I iterated heavily on prompts to get labeled diagrams instead of artistic illustrations.
  • TTS rate limit at 10 RPM: Gemini TTS allows only 10 requests per minute. Naively parallelizing 4-5 slide narrations would burn half the budget in one generation. We implemented a JIT (just-in-time) approach — generating TTS for the first slide immediately for low latency, then generating subsequent slides' audio just before they're needed during playback. This gives the user instant first-slide narration while spreading requests across the playback window, staying well within the rate limit.
  • 30-second generation target: Running STT + script + 4-5 images + TTS sequentially took over 60 seconds. Parallelizing image and TTS generation with a resilient fallback chain (Pro Image → Flash Image, Pro TTS → Flash TTS → Cloud TTS) got it under 30 seconds.

What we learned

Building AI-native means no fallback to static content, if Gemini can't generate it, it doesn't exist. This forced us to build robust normalization, validation, and fallback chains at every layer. The payoff is that every child's experience is unique: no two mysteries, stories, or constellations are the same. Gemini 3's multimodal capabilities make this possible, using one AI backbone for STT, text, images, TTS, and knowledge graphs means every component shares context, creating a coherent experience rather than stitched-together parts. The Knowledge Constellation is the best example of this coherence — Gemini doesn't just generate lessons, it understands how they relate to everything else a child has learned, building a personalized map that no static curriculum could replicate.

What's next for ShowMe - Voice-First AI Learning Platform

  • Cross-topic game modes, the Knowledge Constellation already maps relationships between everything a child has learned. Future modes will pull from multiple topics at once:
    • Dream Machine: Mash up two learned topics into one absurd scenario. "What if dinosaurs had intelligence?" "What if volcanoes erupted on the Moon?" Gemini generates a scientifically-grounded world where both topics collide, real science hidden in the chaos
    • Escape Room: Puzzles that require knowledge from different topics to solve. Learning about electricity AND water? The room combines both. The constellation determines which topic combinations create the best challenges
    • Comic Creator: Build a comic strip that connects two topics into one narrative. Gemini illustrates the panels, the child sequences them. Shareable artifact that proves cross-domain understanding
  • Constellation-driven difficulty, the knowledge graph already knows what a child understands well vs. poorly. Future modes will automatically blend strong topics with weak ones, using mastery in one area to scaffold learning in another
  • More languages beyond English and Chinese
  • Parent/teacher dashboard showing knowledge constellation progress
  • Collaborative learning: kids investigating the same mystery together
  • Offline mode with cached lessons for areas with limited connectivity

Built With

  • express.js
  • gemini-2.5-flash-lite
  • gemini-2.5-pro-preview-tts
  • gemini-3-flash
  • gemini-3-pro-image-(nano-banana-pro)
  • gemini-tts
  • javascript
  • node.js
  • react-18
  • tailwind-css
  • vite
  • web-audio-api
  • websocket
Share this project:

Updates