Inspiration
Nigeria has over 27,000 secondary schools and 270 universities. The educators inside them spend an average of 9.9 hours every week on manual grading — more than a full working day, every week, just marking papers. Students wait days or weeks for results. When feedback finally arrives, the learning context has passed.
GradrAI was built to solve this. We have been automating the grading of paper-based and computer-based tests for Nigerian and African educators since 2022. The platform handles handwritten script grading, CBT exam generation, and student practice through SmartPrep — our adaptive past-question practice module.
But there was a fundamental limitation in the original architecture: every AI grading decision was made in a single pass. One call in, one output out, nothing in between. No validation. No cross-referencing. No ability to catch what the model might have missed or misjudged.
For grading — where a number on a page can define a student's academic future — that was not good enough.
This hackathon gave us the forcing function to rebuild the engine the right way.
What We Built
We rebuilt GradrAI's entire AI layer as a coordinated multi-agent system using Google's Agent Development Kit (ADK), deployed to Google Cloud's Agent Runtime, and connected to MongoDB Atlas as the shared operational memory layer through the official MongoDB MCP server.
The system runs three dedicated agentic pipelines:
1. PBT Grading Pipeline
For paper-based handwritten scripts. An educator uploads their question paper, marking guide, and scanned student scripts. The pipeline:
- Retrieves stored marking guides from MongoDB via the MongoDB MCP server
- Extracts and normalises handwritten answers using multimodal AI
- Grades each question against the rubric with explicit step-by-step reasoning
- Validates every result through a dedicated Referee Agent that checks for hallucinations, numeric errors, and low-confidence decisions
- Generates per-question feedback and a narrative student report
- Persists structured results to MongoDB Atlas via the MCP server
2. CBT Exam Generation Pipeline
For computer-based tests. An educator uploads lecture notes or past papers. The pipeline:
- Ingests knowledge base documents from Cloud Storage
- Extracts topics with concentration weights using a lightweight agent
- Generates a structured quiz (MCQ, essay, or hybrid) aligned to the topic distribution
- Produces a mathematically validated marking guide — point allocations are verified to sum correctly before the guide is saved
- Persists the exam and marking guide to MongoDB via the MCP server
- Returns a draft exam for educator review before publication
3. CBT Submission Grading Pipeline
Triggered asynchronously via BullMQ when a student submits. The pipeline:
- Grades MCQ questions deterministically (no AI, 100% accurate)
- Evaluates essay responses against the stored rubric using the ReAct reasoning framework
- Generates holistic feedback incorporating the student's historical performance from MongoDB
- Flags low-confidence essay responses for human teacher review before finalising the grade
The Closed Loop: Weakness Detection → SmartPrep
This is the part of the architecture we are most proud of — and the part that no competing tool offers.
After every grading pipeline completes (both PBT and CBT), two shared agents run automatically:
WeaknessDetectionAgent reads the scored output, identifies questions where the student scored below 60%, and maps those questions to specific subject topics. It also runs a MongoDB aggregation across all students who attempted the same exam to surface class-wide weak areas.
SmartPrepAgent takes the weakness profile, queries GradrAI's past question bank for relevant ALOC questions on each weak topic, and creates a personalised practice session in MongoDB — tagged to the student, the exam, and the specific subjects they need to revisit.
The student opens SmartPrep and finds a targeted practice set waiting for them, built directly from their own exam result. The teacher did nothing extra. The system closed the loop.
MongoDB is not a passive data store in this architecture. It is the connective tissue that makes the loop possible — the grading results go in through the MCP server, the weakness analysis reads them out, and the practice sessions are written back. Every agent in every pipeline reads from and writes to Atlas through the official MongoDB MCP server.
How We Built It
The agentic layer is Python, built on Google ADK and deployed to Google Cloud Agent Runtime.
The GradrAI backend is Node.js (Express, CommonJS). After the agents complete their work, results flow back to the Node.js layer through Agent Runtime query calls. BullMQ workers handle the async grading queue — they now call the deployed agent pipelines rather than running AI logic directly.
We built a custom GradrAI MCP server deployed on Cloud Run, exposing domain-specific tools to the agents:
parse_questions— structures raw question text into validated JSONparse_marking_guide— converts unstructured marking guides into quantifiable rubrics using AI, with mathematical validation of point allocationsnormalize_answers— pre-processes student inputs before gradingget_resource_from_gcs— fetches educational documents from Cloud Storage for multimodal processingextract_handwritten_text— multimodal OCR tool for handwritten scripts, the core of PBT gradingtrigger_aloc_cache— integrates with the ALOC past questions API with built-in timeout and retry resilience
The MongoDB MCP server handles all Atlas operations: find, insertOne, updateOne, aggregate. No agent manages database connections directly.
Observability runs on OpenTelemetry GenAI instrumentation, with telemetry streamed to BigQuery for latency profiling, token usage tracking, cost analysis, and grading consistency benchmarks.
Challenges
The seasonal usage problem. Grading is not evenly distributed. It spikes massively during exam periods and drops to near-zero in between. Building an agentic pipeline that handles 50-student batch grading inside a BullMQ async job — without blocking the HTTP request cycle and without wasting money on idle compute — required careful thought about where agents run and when.
MCQ grading and AI. We had to be disciplined about not using AI where determinism is available. MCQ grading is a string comparison. Routing it through an AI agent would introduce latency, cost, and unnecessary uncertainty. The architecture separates MCQ (deterministic) from essay (AI-evaluated) explicitly, and only the essay path touches the reasoning agents.
MongoDB MCP as the memory layer. The insight that MongoDB could serve as the operational memory connecting educator grading results to student SmartPrep sessions — through the MCP server — was not obvious at the start. Getting the WeaknessDetectionAgent to read grading outputs from Atlas and the SmartPrepAgent to write practice sessions back to the same cluster, all through typed MCP tool calls, took significant iteration to get right. But it is what makes the closed loop possible.
Human-in-the-loop without breaking the flow. Flagging low-confidence grades for teacher review without stalling the entire pipeline required the Referee Agent to write a status: "PENDING_REVIEW" flag to MongoDB rather than blocking. The educator is notified; the pipeline continues for other students. Getting this state management right through the MCP server was one of the more technically demanding parts of the build.
What We Learned
A well-designed multi-agent system is not just a better version of a single prompt. It is a fundamentally different approach to reliability. When each agent has one job, owns its own tools, and produces a verified output before the next stage begins, the failure modes become predictable and recoverable. That matters enormously in a domain where the outputs affect students' academic records.
We also learned that MongoDB is a better fit for agentic operational memory than we initially anticipated. The combination of structured document storage, aggregation pipelines for class-level analytics, and real-time read/write access through the MCP server made it possible to build a feedback loop that spans educator grading and student remediation without any additional infrastructure.
What's Next for GradrAI
The agentic architecture is the foundation everything else builds on. With it stable, the immediate roadmap is:
Expanded evaluation pipelines. We are building automated benchmarks that compare agent grading decisions against human-marked gold-standard scripts across subjects and question types. This gives us a continuous quality signal and a public accuracy figure we can stand behind.
Mobile — Android first. A React Native offline-first SmartPrep app is in active development. Students in areas with unreliable connectivity will be able to practise from auto-generated weakness-based sessions without needing a stable internet connection.
Deeper institutional integration. Schools need results to flow into their existing student record systems. We are building broadsheet and report card generation directly into the post-grading pipeline — so the output of an agent grading run becomes a school's official term report without any manual assembly.
Pan-African expansion. GradrAI currently targets Nigeria. The ALOC question bank, the exam standards support (JAMB, WASSCE, NECO), and the offline-first architecture are all designed with the broader African market in mind. Ghana, Kenya, and South Africa are the next three markets on the expansion roadmap.
Agent Memory Bank. We are exploring Google Cloud's Agent Platform Memory Bank to give agents persistent context across grading sessions — so the system learns an institution's grading patterns and style preferences over time, not just within a single run.
Transparency Note
The gradr-agent repository contains commits dating to November and December 2025, prior to the official contest start date of May 5, 2026. Those early commits were exploratory work done after completing a Google x Kaggle AI Agents Intensive course (https://www.kaggle.com/certification/badges/johnfiewor/105) and do not reflect the architecture submitted here.
The multi-agent pipeline, MongoDB MCP integration, CBT and PBT grading pipelines, WeaknessDetectionAgent, and SmartPrep closed loop were all designed and built during the contest period. I have documented this openly in the hackathon discussion forum rather than altering the repository history.
I leave eligibility to the judges' discretion.
Built With
- aloc-api
- bigquery
- bullmq
- fastmcp
- google-agent-development-kit-(adk)
- google-cloud
- google-cloud-agent-runtime
- google-cloud-run
- mongodb-atlas
- mongodb-mcp-server
- node.js
- opentelemetry
- paystack
- python
- react
- redis
- tailwind-css
- vite

Log in or sign up for Devpost to join the conversation.