Inspiration

Some of the best learning moments don't happen in classrooms. They happen at kitchen tables, with a patient person who just gets how you think. That person has always existed. They just charge by the hour and aren't available to most people.

We started thinking about what actually makes a great tutor great. It's not credentials. It's the ability to read the room, catch the moment something isn't landing, and try a completely different approach. To ask the right question at the right time. To make a concept feel obvious in hindsight.

We wanted to bottle that. ScholarOS is what came out of it.

What it does

ScholarOS is an AI-powered learning companion that teaches through Socratic questioning and interactive visualizations, dynamically adapting to each student's progress and understanding.

  • Engages through a real-time conversational avatar called Sage that listens, responds, and explains like a person using browser-native Web Speech Recognition for input and Web Speech Synthesis for output
  • Accepts your course material, notes, or syllabus and uses AI to generate a structured learning plan tailored to you
  • Produces animated visual explainers on demand: 3Blue1Brown-style breakdowns generated live via a Dockerized Manim Python server
  • Renders live mathematical graphs and geometric visualizations through Desmos 2D/3D and GeoGebra on a unified whiteboard canvas
  • Creates custom interactive HTML/JS sandbox applets for concepts that need hands-on exploration
  • Maintains a memory of your sessions, tracking what's been covered, what tripped you up, and what needs revisiting
  • Reinforces understanding through follow-up questions and well-placed analogies rather than just monologuing
  • Adapts its pacing and tone based on how you're responding throughout the session
  • Generates clean AI-powered session summaries and progress reports accessible through a dedicated parent dashboard

How we built it

The frontend runs on Next.js 16.1 and React 19 with a custom design system built on Tailwind CSS v4 and shadcn/ui. Zustand 5 manages client-side state. Voice input is handled by browser-native Web Speech Recognition, which streams live transcriptions to the backend. Azure OpenAI GPT-4o processes the conversation and drives Sage's responses through the Vercel AI SDK (ai@6), with SSE streaming for real-time back-and-forth. Sage speaks responses back using Web Speech Synthesis.

The model has access to a suite of tools it can invoke mid-conversation. Manim animation generation runs as a Dockerized Flask server that uses GPT-4o Mini to write and execute Manim code, rendering 3Blue1Brown-style videos on the fly. Desmos 2D/3D and GeoGebra handle live graph and geometry visualizations inline on a unified whiteboard surface built with KaTeX, GSAP, and Rough.js, where equations write out character-by-character with hand-drawn annotations. Custom HTML/JS sandbox applets are generated and rendered in secure iframes when a concept calls for interactivity.

Session state, student profiles, learning plans, and progress tracking are persisted in Supabase (PostgreSQL) via Drizzle ORM. Students log in via PIN-based authentication, and parents can access a dedicated dashboard to view AI-generated session summaries and track progress over time.

Challenges we ran into

The hardest problem was keeping the experience feeling like a conversation and not a loading screen. Real-time tutoring means every part of the pipeline (voice capture, transcription, inference, avatar sync, and tool execution) has to move fast and in harmony. Manim in particular was tricky to fit into a live session; generating a mathematically accurate animation mid-conversation without stalling the flow took a lot of work to get right. We ended up rearchitecting the tool-calling flow entirely to decouple speech streaming from tool execution so Sage keeps talking while visualizations render in the background.

Accomplishments that we're proud of

Getting all the pieces to work together seamlessly (live voice, a responsive avatar, real-time visualizations, and adaptive Socratic teaching logic) in a single cohesive product is something we're genuinely proud of. The unified whiteboard where equations, graphs, simulations, and videos all render inline on one scrollable surface was a major UX win. More than the technical side though, we're proud that it actually feels like talking to a tutor. That quality is hard to manufacture and easy to lose.

What we learned

Orchestrating multiple systems in real time teaches you very quickly where your assumptions were wrong. We learned that the weakest link in a pipeline isn't always where you expect it. Sometimes a 200ms delay lives somewhere completely unglamorous. We also learned that the "intelligence" of a tutoring system matters far less than its rhythm. If the back-and-forth doesn't feel natural, the learning doesn't happen.

What's next for ScholarOS

The immediate focus is performance: tightening the pipeline further and making the experience even more fluid. Beyond that, we want to expand the subjects ScholarOS can handle with depth, build out longer-term learner profiles, and get it in front of real students. The goal from day one has been accessible, high-quality education. We're just getting started on delivering it.

Built With

Share this project:

Updates