Stellaris

Evaluation Report
Progress Screen
Home Page
Learning Screen
Practice Center

Inspiration

Every year, over 50,000 Hong Kong students sit the HKDSE — the high-stakes exam that determines their university future. Private tutoring costs HKD 300–800 per hour, and most families can't afford consistent 1-on-1 support. We watched classmates fall behind — not because they lacked ability, but because they lacked personalised feedback. A teacher with 40 students simply can't diagnose that one student confuses sine and cosine in the third quadrant, or that another keeps forgetting to flip the inequality sign.

We asked: what if every student had two AI tutors — one that teaches, one that tests — and they talked to each other? Not a chatbot. Not a quiz app. A closed-loop learning system grounded in cognitive science. That's EduLoop.

What it does

EduLoop is a dual-agent mastery learning ecosystem for HKDSE Mathematics. Two specialised AI agents collaborate through an intelligent orchestrator:

Teaching Agent — Generates personalised, RAG-grounded lessons from 322 real DSE past-paper chunks. Delivers content as structured LaTeX, active-recall flashcards, AI-narrated audio (MiniMax TTS), and AI-generated video explanations.
Assessment Agent — Dynamically generates DSE-aligned questions, evaluates free-text answers against official marking schemes, diagnoses specific misconceptions, and produces diagnostic reports with actionable next steps.
AWS Bedrock Orchestrator — The "brain" that routes between agents using a state machine ($\text{TEACH} \to \text{ASSESS} \to \text{REVIEW} \to \text{TEACH}$). When the Assessment Agent detects a knowledge gap, it automatically forwards it to the Teaching Agent for targeted remediation — no manual intervention needed.

The platform implements four evidence-based learning science principles:

Spaced Repetition & Ebbinghaus Forgetting Curve — Memory retention is modelled as $R(t) = e^{-t/S}$ where $S$ is the stability factor. The system schedules reviews at optimal intervals (1, 3, 7, 14, 30 days) before retention decays below 50%.
Scaffolded Hints (Zone of Proximal Development) — Instead of showing answers, the system provides 3 progressive hints that guide students through their own reasoning.
Active Recall (Testing Effect) — Flashcards generated from lesson content force retrieval practice, which research shows is 2.5x more effective than passive re-reading.
Dual Coding Theory — Every topic is delivered simultaneously as text (LaTeX), audio (MiniMax TTS), and video (MiniMax Hailuo-2.3) to engage multiple cognitive channels.

How we built it

Backend: FastAPI (Python) serving two agent classes — TeachingAgent and AssessmentAgent — both powered by MiniMax-M2.5 through the Anthropic-compatible SDK. A ChromaDB vector store holds 322 DSE chunks (past papers, marking schemes, syllabi) embedded with all-MiniLM-L6-v2. The BedrockOrchestrator wraps AWS Bedrock's invoke_model API for intent classification and session state management with validated state transitions.

Frontend: Next.js 14 (App Router) with Tailwind CSS, Recharts for data visualisation (radar charts, area charts, pie charts, bar charts), and KaTeX for LaTeX rendering. Seven pages: Dashboard, Learn, Practice, Chat, Progress, Evaluation, Settings.

Multimodal pipeline: Lesson content is paraphrased into natural spoken English via MiniMax LLM, then synthesised into audio via MiniMax speech-2.8-hd. Video explanations are generated via MiniMax Hailuo-2.3 with cinematic prompts, polled asynchronously until ready.

Architecture: $$\text{Student} \xrightarrow{\text{message}} \text{Bedrock AgentCore} \xrightarrow{\text{route}} \begin{cases} \text{Teaching Agent (MiniMax + RAG)} \ \text{Assessment Agent (MiniMax + RAG)} \end{cases}$$

The feedback loop is the core innovation: Assessment Agent outputs a diagnostic_report containing knowledge_gaps[], which the Orchestrator forwards to the Teaching Agent as student_profile.knowledge_gaps — triggering a remediation lesson at reduced difficulty.

Challenges we ran into

The "Handshake" problem — Getting two independent LLM agents to exchange structured data reliably was the hardest engineering challenge. MiniMax-M2.5 sometimes wraps JSON in markdown fences or adds preamble text. We built a robust _safe_json_parse function that strips fences, finds the first { and last }, and handles edge cases gracefully.
Async/sync mismatch — The Anthropic SDK makes blocking HTTP calls, but FastAPI's async event loop doesn't tolerate blocking. We switched all endpoints to sync def so FastAPI automatically runs them in a thread pool — a subtle but critical fix.
LaTeX rendering at scale — Raw OCR text from DSE past papers is messy. We built a parallel formatting pipeline using ThreadPoolExecutor (6 workers) that sends each question through MiniMax for LaTeX cleanup, cutting latency from $N \times 15s$ to ~$15s$ wall-clock.
iCloud path issues — Our development workspace lives on iCloud Drive, which causes Python's os.getcwd() to fail intermittently. We hardcoded os.chdir() to the project root at startup.
Autoplay restrictions — Browsers block audio.play() without user interaction. We wrapped every .play() call in .catch(() => {}) to prevent NotAllowedError crashes while still auto-playing when the browser permits it.

Accomplishments that we're proud of

A real closed-loop system — This isn't two disconnected chatbots. The Assessment Agent's gap analysis actually feeds back into the Teaching Agent's next lesson. The Bedrock Orchestrator manages the state machine with validated transitions: $\text{IDLE} \to \text{TEACHING} \to \text{ASSESSING} \to \text{REVIEWING} \to \text{TEACHING}$.
Learning science, not vibes — Every feature maps to peer-reviewed pedagogy. The forgetting curve isn't decorative — it drives the spaced repetition scheduler. The hints aren't lazy — they're scaffolded at three levels matching Vygotsky's ZPD. Active recall flashcards cite the testing effect.
Full multimodal delivery — A single topic generates text (LaTeX), audio (TTS with natural paraphrasing), video (AI-generated teacher), and interactive flashcards. Four modalities from one RAG query.
322 real DSE chunks in RAG — Not toy data. Real past papers, real marking schemes, real syllabus content from HKDSE Mathematics, embedded and retrievable with cosine similarity.
A polished product — Dashboard with live charts, animated transitions, dual-agent architecture showcase, personalised evaluation with diagnostic reports, and a progress system tracking mastery across topics.

What we learned

Agent orchestration is an engineering problem, not just a prompt problem. Getting two LLM agents to collaborate reliably requires structured communication protocols, state machines, fallback paths, and robust JSON parsing. The prompt is 20% of the work; the plumbing is 80%.
Learning science makes AI education credible. Judges and users immediately see the difference between "GPT wrapper" and "system grounded in Ebbinghaus, Vygotsky, and dual coding theory." The science elevates the story.
MiniMax's multimodal suite is genuinely creative. Using TTS, video generation, and LLM from a single provider enabled a seamless pipeline — paraphrase content, narrate it, and generate a teaching video — all from one API ecosystem.
AWS Bedrock provides production-grade orchestration. The session management, intent classification, and model invocation through Bedrock gave us the reliability layer that stitches the two agents together. The adaptive retries and timeout configuration were essential for a demo-quality experience.

What's next for EduLoop

Full AWS Bedrock AgentCore deployment — Move from invoke_model to Bedrock Agents with action groups, enabling the orchestrator to autonomously trigger multi-step learning workflows without API polling.
Real spaced repetition engine — Connect the forgetting curve model to a persistent database with next_review_at timestamps, so the system proactively schedules review sessions based on predicted retention decay.
Cantonese TTS — Hong Kong students think in Cantonese. Adding MiniMax Cantonese voice narration would make the tutor feel truly local.
Expand beyond Mathematics — The dual-agent architecture is subject-agnostic. With new RAG corpora, EduLoop can support DSE Physics, Chemistry, Economics, and more.
Mobile-first PWA — Students study on their phones. A Progressive Web App with offline flashcard support would make EduLoop accessible anywhere — on the MTR, in study halls, at home.