Problem: Students Hear About AI, But Don’t Use It for Real Studying

Most school students today know AI exists, but very few use it as a serious daily study tool.

In practice, they run into three frictions:

  • No syllabus fit: Explanations often don’t match NCERT/CBSE or their specific textbooks.
  • No memory: They upload PDFs, switch tabs, and next time the AI has forgotten everything.
  • Too much effort: After school, coaching, homework and family time, they don’t have the energy to manage prompts, files, and “sessions”.

The result: AI becomes something they “try once” instead of something they grow with during the whole school year.


Inspiration: A Tutor That Feels Like “Ours”, Not “Somewhere on the Internet”

This all became real for me while helping my nephews in India with their NCERT books.

They didn’t say “open an AI website.” They said things like:

  • “This chapter is confusing.”
  • “Can someone explain this diagram?”
  • “Can you tell this using cricket?”

At the same time, I discovered the Gemini Live API — an agent that can see, hear, and speak in real time. That made me imagine a simple experience:

  • Upload your textbooks once
  • Come back any day, pick a chapter, talk to a tutor that remembers your books
  • Get explanations in your own words, with diagrams, short videos, and whiteboard math
  • Hear analogies in the things you already love (cricket, football, games)

That’s the spark behind Mama AI: a familiar, persistent tutor that lives with your curriculum instead of living in a blank chat box.


Idea: What Mama AI Actually Is

Mama AI is a voice‑first, multimodal AI learning companion designed for the Live Agents (Audio/Vision) category.

At a glance, Mama AI:

  • Lets students upload their textbooks once, and keeps them available until they are deleted
  • Uses multimodal RAG so explanations are grounded in those textbooks
  • Uses the Gemini Live API so students can speak, show, interrupt, and keep going in real time
  • Generates diagrams, short animations, and whiteboard formulas to match how STEM is actually taught

The experience is organised into four modes:

  1. Lab Mode – Camera‑Guided Experiments
    Students point the camera at their lab setup. Mama AI:

    • Recognises equipment
    • Walks through steps and concepts
    • Flags obvious mistakes or risky setups in real time
  2. Exam Mode – Active Recall, Not Just Answers
    Instead of solving everything for the student, Mama AI:

    • Asks exam‑style questions
    • Uses their hobbies (cricket, football, gaming) as metaphors
    • Nudges them toward the answer instead of handing it over
  3. Tutor / Study Mode – Textbook‑Grounded Explanations
    This is the core:

    • Students upload PDFs (e.g., NCERT / CBSE)
    • Mama AI parses, chunks, and embeds the content
    • Questions are answered with that textbook context in mind
      So “Explain this from Chapter 3” maps to their actual Chapter 3, not some random online explanation.
  4. My Notes – Turning Conversations into Study Material
    After each session, Mama AI:

    • Summarises key ideas and formulas
    • Captures important whiteboard steps
    • Links diagrams and animations used in the session
    • Stores everything as a revision‑friendly note

Underneath all this is a simple promise: come back tomorrow and we’ll pick up right where you left off.


How I Built It: From Idea to Running Live Agent

Mama AI architecture diagram I built Mama AI in three main phases using the Google ecosystem.

Phase 1: Shaping the Concept with Gemini 3.1

I started with Gemini 3.1 as a thinking partner:

  • Turned raw thoughts into a product document:
    • Problem → students overwhelmed and tools forgetful
    • Users → exam‑focused school students using structured syllabi
    • Modes → Lab, Exam, Tutor, Notes
  • Sketched flows where:
    • The student always starts from subject/chapter, not from a blank chat
    • Voice and camera are first‑class, not optional add‑ons

This turned “I want to help my nephews” into a concrete, testable product design.

Phase 2: Prototyping Behaviour in Google AI Studio

Next, I used Google AI Studio to test how the agent should behave:

  • Live conversation:
    • Natural follow‑ups
    • Clarifications
    • Handling interruptions (barge‑in)
  • Grounded answers:
    • Prompts that say: “Answer only from this context; if it’s not here, say so.”
  • Teaching style:
    • Explaining STEM topics through cricket/football/gaming analogies
    • Switching between high‑level intuition and formula‑level detail

This phase confirmed that the behaviours I wanted were achievable before wiring up an entire backend.

Phase 3: Full Implementation with Gemini, Firebase, and Cloud Run

For the production version, I moved to Google’s GenAI SDK + Firebase + Cloud Run.

Frontend (React + Vite + TypeScript):

  • Mobile‑first interface
  • Voice controls to start/stop Live sessions
  • Camera integration for:
    • Textbook pages
    • Homework
    • Lab setups
  • Whiteboard with LaTeX for formulas such as:
    • ( x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} )
    • ( v^2 = u^2 + 2as ), ( F = ma )

Backend & Data (Firebase + Google Cloud):

  • Firebase Auth – user accounts and secure access
  • Cloud Firestore – stores:
    • User profiles
    • Textbooks and chapter metadata
    • Embeddings for textbook text and diagrams
    • Session logs, study notes, media job status
  • Firebase Cloud Storage – stores:
    • User‑uploaded PDFs
    • Generated images and videos
    • Cached media for reuse

AI & Agent Layer (Google GenAI SDK):

  • Live Conversation & Vision
    • gemini-2.5-flash-native-audio-preview-12-2025 for real‑time voice + camera input
  • Reasoning & Note‑Making
    • gemini-3.1-pro-preview for deep explanations and synthesis
    • gemini-3.1-flash-lite-preview for quick parsing and lightweight tasks
  • RAG & Embeddings
    • gemini-embedding-2-preview for per‑user, per‑textbook embeddings
  • Visual Generation
    • gemini-3.1-flash-image-preview (Nano Banana 2) for portrait diagrams
    • veo-3.1-fast-generate-preview for short educational animations

Deployment:

  • Containerised the backend
  • Deployed on Google Cloud Run
  • Connected Gemini, Firebase, and storage via environment configuration
  • Verified the entire backend runs on Google Cloud, as required by the challenge

All of this is built so that one student action — “Open Mama AI and pick Physics Chapter 3” — triggers a full live agent workflow, not just a single model call.


Challenges: Where It Was Hard (and Fun)

1. Making “Upload Once, Reuse Always” Real

The promise sounds simple, but building it meant:

  • Designing Firestore to match how students think:
    /users/{userId}/textbooks/{textbookId}/chapters/{chapterId}/pages/{pageId}
  • Parsing PDFs into meaningful chunks (chapters, sections, pages)
  • Embedding and retrieving content fast enough for live audio sessions

Once that worked, the experience finally felt like a long‑term tutor, not a one‑time file upload tool.

2. Keeping Explanations Aligned with Textbooks

Students care about what the teacher and exam expect. So I had to:

  • Tune prompts so textbook context is the primary source of truth
  • Let Mama AI admit “this isn’t in your book” instead of guessing
  • Retrieve context at chapter/section level so responses feel familiar

This alignment is what makes the tool exam‑friendly instead of just “AI‑interesting.”

3. Live Audio, Context, and Interruptions

In real usage:

  • Sessions grow long
  • Students interrupt when something clicks (or doesn’t)
  • Audio streams must stay smooth

I tackled this with:

  • Session lifecycle management and context limits
  • Clear handling of barge‑in so the agent stops, listens, and continues gracefully
  • Careful work with the browser AudioContext and stream cleanup

4. Visual Generation Without Killing the Flow

Images and videos take seconds to generate. Blocking everything until they arrive felt wrong.

So I:

  • Treated each visual as an async job tracked in Firestore
  • Designed responses where Mama AI says it’s “drawing” or “preparing” something while talking
  • Auto‑injected the finished diagram or animation into the UI when ready

The agent keeps feeling live and conversational, even while heavy visual work happens in the background.


What I’m Proud Of

  • Curriculum that actually stays.
    Upload once, reuse across the entire term. No endless re‑upload loops.

  • A genuinely multimodal tutoring flow.
    Voice, vision, text, diagrams, animations, and whiteboard math work together like a real lesson.

  • Student‑first design.
    Everything is built around “Pick subject → Pick chapter → Talk,” not “Upload file → Engineer prompt.”

  • Shipping this end‑to‑end while learning.
    I went from exploring Gemini and Google Cloud to running a fully deployed Live Agent that my nephews — and other students — can actually use.


What I Learned

  • Access is not the main barrier — clarity is.
    Many students know AI exists; they just don’t have a tool that fits how they really study.

  • Grounding and persistence beat one‑off cleverness in education.
    A tutor that remembers your books and stays aligned with your syllabus is more valuable than a one‑time “wow” answer.

  • STEM is naturally multimodal.
    When you combine speech, text, formulas, diagrams, and experiments in one agent, it starts to look a lot like a real classroom.

  • The Gemini + Google Cloud stack is powerful for solo builders.
    With Live API, GenAI SDK, Firebase, and Cloud Run, it’s possible for one person to go from idea to a real, cloud‑hosted live tutor that runs on production infrastructure.

Mama AI is my way of turning those lessons into something students can actually open, talk to, and rely on — not just for one question, but for an entire school year.

Built With

  • better-sqlite3
  • cloudfirestore
  • css3
  • date-fns
  • docker
  • epub.js
  • express.js
  • firebaseauthentication
  • firebasecloudstorage
  • gemini-3.1-flash-image-preview
  • gemini-3.1-flash-lite
  • gemini-3.1-pro
  • geminiembedding-2
  • geminiliveapi
  • google-cloud
  • googlecloudrun
  • googlecloudstorage
  • googlegenai-sdk
  • html5
  • javascript
  • jszip
  • katex
  • lucidereact
  • motion
  • nanobanana-2
  • node.js
  • pdf.js
  • react-18
  • reactrouter-dom
  • tailwindcss
  • typescript
  • veo-3.1-fast
  • vite
  • webaudioapi
  • webrtc
Share this project:

Updates