🪞 Socratic Mirror Agent

🌟 Inspiration

The best teachers don’t give answers — they ask questions.

For centuries, the Socratic method has helped people truly understand concepts by guiding them with questions instead of explanations. But modern learning is mostly passive: videos, notes, and one-way instruction.

We noticed a clear gap:

  • Students consume content but don’t internalize it
  • Interview candidates know material but fail under pressure
  • Learners lack a real-time, adaptive human presence

So we asked:

What if AI didn’t just answer questions — but helped you think?

That idea became Socratic Mirror Agent — an AI that reflects your understanding back at you and helps you improve in real time.


🎯 Live Agents Track (Real-Time Interaction)

Socratic Mirror Agent is built specifically as a real-time, interruptible AI agent using Gemini Live API and deployed on Google Cloud.

  • 🗣️ Natural voice conversations with streaming audio (not delayed responses)
  • Barge-in support — users can interrupt the AI mid-sentence, and it adapts instantly
  • Low-latency streaming pipeline (50ms audio chunks via WebSockets)
  • 🔄 Bidirectional interaction — continuous listening and responding

Unlike traditional chatbots, our system maintains a live conversational loop, making the interaction feel fluid, human, and responsive.


🚀 What it does

Socratic Mirror Agent is a multimodal AI coaching system with three modes:

🧠 Socratic Tutoring

  • Teaches using guided questions only (no direct answers)
  • Adapts to confusion in real time
  • Generates live visual aids (steps, equations, diagrams)
  • Reinforces learning through continuous checks

🎤 Public Speaking Coach

  • Real-time feedback on:

    • filler words
    • pacing
    • delivery
  • Timestamped improvement suggestions


💼 Interview Prep

  • AI interviewer that adapts based on your answers
  • Covers:

    • behavioral
    • technical
    • follow-up questions
  • Uses job description + resume context


📊 Vibe Report

After each session:

  • Performance score
  • Strengths & weaknesses
  • Actionable improvements

🛠️ How we built it

⚙️ Architecture Overview

Frontend

  • Next.js + React + React Three Fiber
  • 3D avatar (Ready Player Me) with real-time lip sync

Backend

  • FastAPI deployed on Google Cloud
  • Gemini 2.0 Flash + Gemini Live API
  • WebSocket-based streaming system

🔴 Real-Time Audio Pipeline (Core to Track)

User Voice → AudioWorklet → WebSocket → Gemini Live API  
→ Streaming AI Response → Viseme Timeline → Avatar Lip Sync
  • Audio is streamed in small chunks (~50ms) for low latency
  • Gemini Live enables continuous bidirectional streaming
  • Playback + animation are synchronized in real time

🧩 Agentic Tutoring Engine (Core Innovation)

We built a state-driven decision engine that tracks:

  • Confusion signals
  • Correct answer streaks
  • Progress depth

Instead of relying on prompting alone, we inject hidden behavioral hints into the model:

Hint Trigger Result
re_explain Confusion detected New explanation style
provide_example No example yet Adds concrete example
ask_socratic High confidence Deeper probing
suggest_path Topic drift Guided exploration

👉 This makes the AI feel like a real adaptive tutor, not a chatbot.


🎯 Real-Time Multimodal Inputs

  • Voice (speech + tone)
  • Optional biometric signals (stress, engagement)
  • Future-ready for facial expression analysis

⚡ Challenges we ran into

🚨 Performance Issues

  • CSS blur effects caused full CPU repaints
  • Fixed by shifting to GPU-friendly transforms
  • Result: stable 60 FPS UI

👄 Lip-Sync Drift

  • Audio and visemes were misaligned
  • Fixed by syncing timeline to actual audio playback start
  • Result: frame-accurate lip sync

✋ Real-Time Interruption (Barge-In)

  • Interrupting AI mid-response required:

    • cancelling active audio streams
    • resetting state safely
    • resuming conversation seamlessly
  • This was critical to achieving natural conversation flow


🎙️ Voice System Limitations

  • No gender metadata in Web Speech API
  • Solved using cross-platform voice name mapping

🔇 Noise Handling

  • Background sounds interfered with AI input
  • Added:

    • RMS-based filtering
    • Intent detection

🏆 Accomplishments that we’re proud of

  • Built a true real-time AI agent (not just a chatbot)
  • Seamless interruptible voice interaction (barge-in)
  • Integrated Gemini Live API + Google Cloud deployment
  • Real-time bidirectional voice + animation pipeline
  • Sub-frame accurate avatar lip sync
  • Smooth UI performance even on low-end devices

📚 What we learned

  • LLMs default to explaining — not guiding → Requires explicit behavioral control systems

  • Prompting alone isn’t enough → State + logic + prompts = real intelligence

  • Real-time AI requires:

    • streaming architectures
    • latency optimization
    • synchronization handling
  • Browser performance is often about rendering layers, not code


🔮 What’s next

  • 👁️ Facial expression–based confusion detection
  • 🧠 Persistent learning profiles
  • 👥 Multi-user collaborative learning mode
  • 📖 Auto-generated learning curricula
  • 🏫 LMS integrations (Canvas, Moodle, Google Classroom)

💡 Why it matters

Socratic Mirror Agent shifts AI from:

❌ Answer machine → ✅ Thinking partner

Instead of replacing learning, it amplifies how humans learn best — through questioning, reflection, and discovery.


🧑‍💻 Built With

  • Gemini Live API
  • Google Cloud
  • Next.js
  • React
  • Three.js / React Three Fiber
  • FastAPI
  • WebSockets
  • Web Audio API

Built With

  • asyncio
  • audioworklet-api
  • css-modules
  • fastapi
  • gemini-2.0-flash
  • gemini-live-api
  • google-cloud-text-to-speech-api
  • google-gemini-api
  • next.js
  • python
  • react
  • react-three-fiber
  • ready-player-me
  • tailwind-css
  • three.js
  • typescript
  • web-speech-api
  • webgl
  • websockets
Share this project:

Updates