ione

the AI tutor that builds a grounded knowledge graph of every mistake you've ever made, and uses it to predict the next one before you commit it.

Inspiration

tutoring is gatekept by price. the kid whose parents pay for a Stanford grad on Wyzant gets a human watching them work in real time, catching the sign error, remembering next week that they always slip on negatives. the kid two desks over with the same homework and the same confusion gets Khan Academy and a youtube comment section.

both kids end up at the same SAT. one had someone in the room. the other had themselves.

the gap isn't intelligence. it's who's sitting next to you at 9pm on a tuesday when you hit the wall on u-substitution. private tutoring is a $60B industry in the US. it works, and that's the whole problem. the thing that works costs more than the families who need it most can pay.

generative AI changes the input surface. attention, judgment, and memory that compounds across sessions are model capabilities now. but most AI tutors throw the memory part away, every conversation starts from zero. ione doesn't. every cycle, every uploaded exam, every confirmed slip becomes a typed claim in a per-student knowledge graph. session three knows things session one didn't. the tutor gets better at you over time, the way a real one does.

accessibility isn't a bonus feature. it's the whole thesis.

What it does

connect ione to your iPad and start solving. it watches every frame, runs a four-agent pipeline (OCR, reasoning, predictive, intervention), and stays silent, until staying silent stops being the right call. when you slip, a hint card slides into the margin in pencil-script: never the answer, just the question that gets you unstuck.

what makes it different is what happens between sessions. every error you make, every uploaded exam, every confirmed pattern becomes a typed, cited fact in your personal knowledge graph. the next time you sit down, next problem, next session, next month, the tutor remembers. it predicts the sign error you're about to make because the graph already has four prior ones, each tied back to the source line that proved it. you can see the receipts. you can edit them. mark a claim resolved and the next hint shifts.

upload a failed exam and within thirty seconds the graph has claims with chunk citations powering your next study session. push-to-talk to ask a verbal question and a spoken answer plays back roughly a second later. open the dashboard and see every fact ione knows about you, where it came from, and toggle it on or off.

How we built it

three systems: a Vite + React 19 capture surface, a Hono + Node API running the four-agent pipeline, and a per-student knowledge graph in Postgres.

the knowledge graph (the part we're proudest of)

every fact is a subject-predicate-object triple with a hard pointer to the chunk it came from. no chunk, no claim. agents can't fabricate memory because they can't write a claim without citing a chunk uuid that exists.

the schema (supabase/migrations/0002_knowledge_graph.sql):

  • source_files — uploaded transcripts, exams, syllabi, essays, practice work. blobs in Supabase Storage.
  • chunks — the receipt primitive. byte offsets back to the source so the UI can prove where a fact came from. each chunk carries an embedding (text-embedding-3-small, 1536-dim) so retrieval can run on semantic similarity, not just exact matches on predicate.
  • claims(owner, subject, predicate, object, confidence, status, source_chunk_id, extracted_by). predicates come from a controlled vocabulary; agents do not invent them. dedup'd at the DB level via a unique index on (source_file_id, predicate, subject_entity).
  • entities + relationships — canonical things (topics, error types, courses) and the typed edges between them.
  • events — append-only log, broadcast via Supabase Realtime so the dashboard reacts without polling. claim lifecycle is pending → confirmed | rejected | superseded. mark made_sign_error as resolved from the dashboard and the orchestrator stops referencing it.

how the graph gets fed. category-specific extractors in api/src/kg/ (transcript-reader, exam-reader, syllabus-reader, essay-reader, practice-work-reader). upload a failed midterm, get back claims like weak_at_factoring (0.82, cite: chunk #f3a2), each one editable, each one a click from the source line. live cycles also write claims with full provenance (session_id, cycle_id, predicted=true|false).

how retrieval works. the orchestrator's searchGraph() surface (landing/src/lib/graph/query.ts) is deliberately narrow so the underlying retrieval can swap between strategies. the primary path is structured filter on claims + chunks (predicate, source kind, status), but for free-text and "find me prior context that looks like this" queries — extracting a new claim and checking whether the graph already has a paraphrase, or pulling the most semantically relevant prior chunks for the predictive agent — we fall through to vector similarity search over chunk embeddings. cosine similarity, top-k scoped to the user via RLS, then we re-rank by claim confidence and recency before the agent sees it. embeddings are what make "the graph remembers something like this" work; the controlled-vocabulary triples are what keep retrieval explainable.

how the graph drives intervention. every cycle's first move is getStruggleSnapshot(userId) — pulling the compiled struggle profile plus the actual claim references that built it. an SSE kg_lookup event fires before any agent runs, with predicate, count, dominant error, and source filenames. the AgentTrace UI renders these as receipts. memory isn't a vibes feature here; every claim is auditable, editable, grounded in source bytes.

the rest of the stack

  • capture — browser grabs WebP frames at a steady cadence, posts multipart + a JSON sidecar (session id, stall flag, last 5 frames of trajectory). response is an SSE stream of typed events: ocr, kg_lookup, confidence, hint, done. push-to-talk routes through ElevenLabs Scribe → orchestrator → ElevenLabs Flash v2.5 over MediaSource for sub-second voice playback.
  • API — Hono on Node 20, Fly.io. six route groups: /cycle, /sessions, /audio, /transcribe, /me, /sources. cost guardrails BEFORE every cycle. one-active-session-per-user enforced at the DB level via a partial unique index.
  • pipelineOCR → (Reasoning ∥ Predictive) → Policy → (Intervention?). OCR runs Mathpix v3/text alongside Claude Sonnet 4.5 vision. reasoning caches the canonical solution per session. the predictive agent reads off the knowledge graph's struggle profile to predict the next error before the student commits it. policy is pure deterministic typescript, cooldown, dedup, severity floor, 0.7 predictive threshold. silence-biased.
  • infra — Vercel + Fly.io + Supabase. seven idempotent SQL migrations. eval harness gates every deploy at 13/15 orchestrator + 4/4 KG extractor fixtures green.

Challenges we ran into

making the graph grounded. our first version let extractors write claims with no chunk citation. the model would invent plausible-sounding facts, "student struggles with completing the square" on a transcript that didn't mention it. we made source_chunk_id non-optional in extractor logic and added the unique index (source_file_id, predicate, subject_entity) so re-running an extractor updates instead of duplicating. the graph became immediately trustworthy: every claim has a chunk, every chunk has a byte offset, every fact is one click from its source.

OCR on dense pages. our first OCR agent had a 700-token output budget and had to return full problem text plus an array of every completed step's LaTeX. on real pages with five derivations, Sonnet was truncating mid-JSON and the parser was silently dropping completed_steps_latex. bumped to 1600 tokens and added a defensive normalizer.

Accomplishments that we're proud of

a knowledge graph that compounds. session three knows things session one didn't. upload an exam and within thirty seconds the graph has claims with chunk citations, and those claims power hint phrasing on the next problem. the same student writing the same wrong answer in week 1 and week 4 gets two visibly different hints, because the second one knows it's a pattern. memory isn't a vibes feature here, every claim is auditable, editable, and grounded in source bytes.

a tutor that actually stays quiet. you write math, the ribbon tracks confidence in real time, and ione waits. you make a deliberate sign error and within two cycles a hint card slides into the margin in pencil-script, and it doesn't say "you made a mistake." it asks "what happens to the sign when you distribute the negative?" the answer comes from the student.

the price gap. at our cost guardrails, a focused study session runs pennies, not the $80/hour a human tutor charges. the gap between a kid with money and a kid without it stops being the difference between getting a tutor and not.

What we learned

the graph is the product. Sonnet, Mathpix, ElevenLabs are incredible individually, none of them is the thing. the thing is the per-student knowledge graph that turns every cycle, every cycle, and every uploaded source into a typed, cited fact, and the orchestrator that reads off it. our first version was 90% LLM and 10% glue and it felt unreliable. the LLMs got smaller as the graph got bigger, and the product got better.

memory is only useful if it's auditable. "the AI remembers you" is a marketing claim, not a feature. memory becomes a feature when the student can see exactly what was remembered, where it came from in the source, and edit it. every claim in our graph has provenance. when a student marks a claim as resolved, the next hint phrasing changes.

What's next for ione

richer graph relations. right now claims are mostly student-scoped triples. the next layer is inter-claim relationships, made_sign_error linked to weak_at_distributing_negatives linked to a specific topic entity, so the orchestrator can plan forward (this student needs to see two more distribution problems before their unit test on factoring) instead of just reacting.

subjects beyond math. the agent pipeline is math-specific, the graph isn't. the same predicates (weak_at, made_error, prefers) work for essays, foreign language, intro CS. swap the OCR backend and the canonical solver and the same graph runs on every subject the student touches.

Built With

Share this project:

Updates