Inspiration

64% of 4th grade U.S. students were below NAEP proficient in math in 2022. UofT research shows students receiving one-on-one tutoring outperformed their peers more than 80 per cent of the time. But at $40–$80/hour, personalized instruction is out of reach for many. And even when kids do access online learning, parents are left in the dark with no way to know if their child is engaged, struggling, or just staring at a screen. We wanted to fix all three problems at once: make expert-level tutoring affordable, make it actually adaptive, and give parents real visibility into what's actually happening.

What it does

Struggling in math? Vertex has you covered. Breezing through it? Vertex will push you further.

  • Kids have live 1-on-1 sessions with an AI clone of their own parent that adapts to their level
  • Parents upload their face and voice, add homework, and track everything
  • A attention engine watches attention live and intervenes when needed
  • Every session ends with a full recap

How we built it

Frontend & Backend

  • Next.js 16, TypeScript, React 19, Tailwind CSS 4, Framer Motion, Shadcn UI
  • Supabase (PostgreSQL, Auth, Storage), KaTeX, JSXGraph, Recharts, Lucide React

AI Tutoring

  • GPT-4o, GPT-4o mini, dynamic system prompts
  • Adaptive difficulty engine, Socratic response layer
  • Lesson plan generation, quiz generation, AI-generated session reports

Live Avatar

  • LiveKit (WebRTC), OpenAI Realtime API (gpt-4o-realtime-preview), Simli avatar streaming
  • ElevenLabs voice synthesis, Python LiveKit Agents framework (livekit-agents 1.4)
  • Semantic VAD turn detection, per-session agent dispatch

Attention Engine

  • MediaPipe Tasks Vision — gaze detection, head pose, blink tracking, client-side only, no video leaves device
  • Tab visibility API, response latency tracking, keyboard and mouse interaction scoring
  • 6-signal weighted formula with EMA smoothing

Infrastructure & Auth

  • Supabase Auth, Row Level Security, 6-digit access code system for kids
  • Supabase Realtime, PDF parsing via pdf-parse
  • Resend for parent alerts and session reports

    Challenges we ran into

    1. Avatar & Real-Time Streaming

  • Streaming Simli lip-sync video while GPT-4o simultaneously ran answer evaluation, difficulty adjustment, and response generation without perceptible lag

  • Built a token-streaming pipeline so GPT-4o output fed directly into Simli before the full response was generated, cutting time to first spoken word

  • Coordinated render state between the avatar SDK and the question engine so the next prompt never fired until the current lip-sync buffer fully flushed

  • Handled mid-sentence barge-in and response interruption without corrupting the avatar render queue

2. Attention Engine

  • Combined MediaPipe Face Mesh gaze vectors, head pose estimation (pitch, yaw, roll), blink rate via Eye Aspect Ratio, and tab visibility into a single weighted focus formula
  • Built a rolling 5-detection window with EMA smoothing so no single bad frame could spike the score and trigger a false parent alert
  • Calibrated a personal baseline multiplier in the first 2 minutes of each session so the engine scores against the kid's own behavior not a global threshold
  • Tuned policy thresholds on real session data so check-ins, micro-task mode, and session end triggers match actual on-task versus distracted behavior

Accomplishments that we're proud of

Attention Engine

  • Built an Attention Engine using MediaPipe Face Mesh (gaze, blink, head pose) at 10fps combined with tab visibility, response latency, and interaction activity
  • Six signals feed into a weighted formula with EMA smoothing, outputting a 0 to 100 focus score every 30 seconds
  • Policy Engine classifies severity and triggers a gentle check in, micro task mode, or session end accordingly
  • No raw frames ever leave the device
  • Tuned signal weights and policy thresholds on real session data to match actual on task versus distracted behavior
  • Content Confidence formula calibrated to reflect what mastered versus needs work looks like in practice

Attention Engine Architecture

What we learned

  • Simli AvatarSession requires a publicly reachable wss:// LiveKit URL to publish video tracks — local dev tunnels are not a substitute for a real deployment
  • OpenAI Realtime API is silent on misconfiguration — wrong model name or voice ID produces no error, just a broken session, always validate the full config object before connecting
  • Raw MediaPipe signal scores need EMA smoothing with a rolling window before feeding into any policy engine or noise spikes will trigger constant false interventions
  • Supabase RLS policies interact with every query at the database level — schema changes after policies are live break silently in ways that are hard to debug, design the full access pattern upfront
  • Running Next.js and a Python LiveKit agent as two separate processes in dev requires explicit env sync, port management, and process lifecycle handling or sessions fail in non-obvious ways

What's next for Vertex

What's Next for Vertex

  • Run a beta with parents of elementary school students to gather feedback on the dashboard experience and test the attention engine against real kid behavior
  • Target elementary schools as a guided homework tool used at home or as an in-class activity that reinforces what the teacher just covered
  • Expand the parent dashboard into a teacher dashboard giving educators visibility across an entire classroom's engagement and progress
  • Math is just the starting point, the core infrastructure (adaptive avatar tutor, attention engine, session reporting) applies to any subject
  • Long term expansion into higher curriculum education

Built With

Share this project:

Updates