Inspiration

64% of 4th grade U.S. students were below NAEP proficient in math in 2022. UofT research shows students receiving one-on-one tutoring outperformed their peers more than 80 per cent of the time. But at $40–$80/hour, personalized instruction is out of reach for many. And even when kids do access online learning, parents are left in the dark with no way to know if their child is engaged, struggling, or just staring at a screen. We wanted to fix all three problems at once: make expert-level tutoring affordable, make it actually adaptive, and give parents real visibility into what's actually happening.

What it does

Struggling in math? Vertex has you covered. Breezing through it? Vertex will push you further.

Kids have live 1-on-1 sessions with an AI clone of their own parent that adapts to their level
Parents upload their face and voice, add homework, and track everything
A attention engine watches attention live and intervenes when needed
Every session ends with a full recap

How we built it

Frontend & Backend

Next.js 16, TypeScript, React 19, Tailwind CSS 4, Framer Motion, Shadcn UI
Supabase (PostgreSQL, Auth, Storage), KaTeX, JSXGraph, Recharts, Lucide React

AI Tutoring

GPT-4o, GPT-4o mini, dynamic system prompts
Adaptive difficulty engine, Socratic response layer
Lesson plan generation, quiz generation, AI-generated session reports

Live Avatar

LiveKit (WebRTC), OpenAI Realtime API (gpt-4o-realtime-preview), Simli avatar streaming
ElevenLabs voice synthesis, Python LiveKit Agents framework (livekit-agents 1.4)
Semantic VAD turn detection, per-session agent dispatch

Attention Engine

MediaPipe Tasks Vision — gaze detection, head pose, blink tracking, client-side only, no video leaves device
Tab visibility API, response latency tracking, keyboard and mouse interaction scoring
6-signal weighted formula with EMA smoothing

Infrastructure & Auth

Supabase Auth, Row Level Security, 6-digit access code system for kids
Supabase Realtime, PDF parsing via pdf-parse
Resend for parent alerts and session reports

Challenges we ran into

1. Avatar & Real-Time Streaming
Streaming Simli lip-sync video while GPT-4o simultaneously ran answer evaluation, difficulty adjustment, and response generation without perceptible lag
Built a token-streaming pipeline so GPT-4o output fed directly into Simli before the full response was generated, cutting time to first spoken word
Coordinated render state between the avatar SDK and the question engine so the next prompt never fired until the current lip-sync buffer fully flushed
Handled mid-sentence barge-in and response interruption without corrupting the avatar render queue

2. Attention Engine

Combined MediaPipe Face Mesh gaze vectors, head pose estimation (pitch, yaw, roll), blink rate via Eye Aspect Ratio, and tab visibility into a single weighted focus formula
Built a rolling 5-detection window with EMA smoothing so no single bad frame could spike the score and trigger a false parent alert
Calibrated a personal baseline multiplier in the first 2 minutes of each session so the engine scores against the kid's own behavior not a global threshold
Tuned policy thresholds on real session data so check-ins, micro-task mode, and session end triggers match actual on-task versus distracted behavior

Accomplishments that we're proud of

Attention Engine

Built an Attention Engine using MediaPipe Face Mesh (gaze, blink, head pose) at 10fps combined with tab visibility, response latency, and interaction activity
Six signals feed into a weighted formula with EMA smoothing, outputting a 0 to 100 focus score every 30 seconds
Policy Engine classifies severity and triggers a gentle check in, micro task mode, or session end accordingly
No raw frames ever leave the device
Tuned signal weights and policy thresholds on real session data to match actual on task versus distracted behavior
Content Confidence formula calibrated to reflect what mastered versus needs work looks like in practice

Attention Engine Architecture

What we learned

Simli AvatarSession requires a publicly reachable wss:// LiveKit URL to publish video tracks — local dev tunnels are not a substitute for a real deployment
OpenAI Realtime API is silent on misconfiguration — wrong model name or voice ID produces no error, just a broken session, always validate the full config object before connecting
Raw MediaPipe signal scores need EMA smoothing with a rolling window before feeding into any policy engine or noise spikes will trigger constant false interventions
Supabase RLS policies interact with every query at the database level — schema changes after policies are live break silently in ways that are hard to debug, design the full access pattern upfront
Running Next.js and a Python LiveKit agent as two separate processes in dev requires explicit env sync, port management, and process lifecycle handling or sessions fail in non-obvious ways

What's next for Vertex

What's Next for Vertex

Run a beta with parents of elementary school students to gather feedback on the dashboard experience and test the attention engine against real kid behavior
Target elementary schools as a guided homework tool used at home or as an in-class activity that reinforces what the teacher just covered
Expand the parent dashboard into a teacher dashboard giving educators visibility across an entire classroom's engagement and progress
Math is just the starting point, the core infrastructure (adaptive avatar tutor, attention engine, session reporting) applies to any subject
Long term expansion into higher curriculum education