Markly Logo

Inspiration

Writing feedback in schools is broken by time, not talent. A teacher with 30 students can spend an entire weekend marking essays and still return work with comments like "good structure, develop your ideas" — feedback so generic it teaches nothing. By the time it lands on a student's desk, the moment to improve has passed.

We wanted to build the thing that should already exist: a system that reads a student's essay the way a trained marker does — criterion by criterion, with evidence from the actual text — and returns that feedback in seconds, not days. Not a grammar checker. Not a readability score. Real rubric-based assessment.

The harder version of that problem is what kept us up at night: how do you make AI feedback trustworthy enough that a teacher would actually show it to a student and stand behind it?

What it does

Markly is a full writing practice platform built around AI marking.

Students submit a piece of writing — narrative, persuasive, or expository — and get back:

A score on every rubric criterion, with evidence quoted directly from their text
What they did well, what to fix, and a specific micro-goal for next time
A corrected version of their essay with errors annotated, alongside a strength-highlighted clean draft
Sentence-level coaching hints on specific paragraphs
An integrity signal — a heuristic estimate of plagiarism risk and AI authorship likelihood

The practice system keeps students improving between submissions. Daily writing prompts are generated across four frameworks (NAPLAN, IB, Common Core US, AP Language/Literature) for every year level and genre. An adaptive planner analyses a student's last ten submissions, identifies their weakest criteria, and generates a personalised two-week activity schedule — each exercise paired with a model answer and annotated highlights explaining exactly why it scores full marks.

Teachers and parents get a separate management layer:

Assign tasks with deadlines and timer presets
Set score and streak goals
Build custom rubrics with AI-generated model answers
Create reward catalogs students can redeem with their credits
Receive weekly digest emails on student activity

When a student completes an assigned task, credits come from the teacher's pool — not the student's.

How we built it

The stack is Next.js 15 with React 19 on the frontend, deployed to Vercel. The backend is NestJS running on Fastify, hosted on a Google Cloud VM. PostgreSQL on Neon handles persistence through a 26-model Prisma schema. Firebase handles authentication. LLM calls go through an OpenAI-compatible SDK pointed at an Ollama endpoint running gemma4:31b-cloud.

The marking pipeline is the core of the system. When a submission comes in, we select the right rubric for the student's framework, year level, and genre — pulling from seeded NAPLAN scoring folder data, or from one of twelve built-in framework rubric variants. We compact that into a dense prompt snippet and run three parallel marking passes at different calibration levels:

Normal — balanced, follows descriptors exactly
Generous — awards higher adjacent bands when evidence is plausible
Picky — requires explicit sustained evidence before awarding higher bands

This gives teachers a score range rather than a falsely precise single number. The display score is the Normal pass; the range is [Picky, Generous].

The adaptive plan uses Jaccard similarity on tokenised submission text to build a criterion performance map across the student's history, then feeds their weakest areas into a prompt that generates a structured two-week schedule with model answers for each activity.

The credit system prices marking dynamically by word count — scaling from 90 to 150 credits with tier-specific peaks at 500, 750, or 1000 words — so heavy users pay proportionally more regardless of subscription level.

Challenges we ran into

Making the AI mark strictly

LLMs are optimistic by default. Getting consistent, rubric-anchored scores — without inflating marks — required significant prompt engineering. The system prompt has to be extremely explicit: quote evidence under 20 words, full marks only for flawless work, never invent criteria. Even then, calibration drift between model versions required the three-pass scoring system as a structural correction rather than a prompt-level fix.

Rubric fidelity across frameworks

NAPLAN narrative has "Character and Setting." NAPLAN persuasive has "Persuasive Devices." IB has four criteria scored out of 8. AP Literature has four different criteria scored out of 4. Each combination of framework, genre, and year level needs the right rubric slice, and getting that selection logic right — with sane fallbacks when data is missing — took more edge cases than expected.

Trust

The hardest non-technical problem. A teacher won't show AI feedback to a student unless they believe it's fair. Every design decision — showing evidence quotes, displaying calibration ranges, flagging inapplicable criteria as N/A rather than penalising the student, surfacing the band descriptor the AI used — exists to make the reasoning auditable.

Auth domain fragmentation

Deploying across Vercel and a custom domain surfaced a Firebase auth iframe 404 that silently broke all sign-in methods. The fix was one environment variable, but diagnosing it required tracing through Firebase's auth handshake to find which domain was being used and why.

Accomplishments that we're proud of

A marking pipeline that returns criterion-level feedback with evidence quotes in under 30 seconds, across five international writing frameworks
Three-pass calibration that gives a principled score range instead of a single AI guess
An adaptive plan that doesn't say "practice more" — it says "your Cohesion is at 58% across your last ten submissions; here are five specific activities this week, each with a model answer showing you exactly what full marks looks like"
A credit system with dynamic word-count pricing, teacher-pool billing for assigned tasks, and daily tier-based refresh — all handled in atomic database transactions
Custom rubrics where teachers define their own criteria, the AI generates a model answer across all of them, and students can be marked against it
Inapplicable criteria handled correctly: if a narrative-only criterion appears on a persuasive submission, it gets full marks and is flagged as N/A — the student is never penalised for the rubric not fitting

What we learned

Prompt engineering is product design. A single sentence in a system prompt changes scores by entire bands. We learned to treat the AI's instructions with the same rigour as a UI specification — iterated, tested against edge cases, and version-controlled.

We also learned that the ceiling on EdTech AI tools isn't the model quality — it's teacher trust. The question isn't "can the AI mark accurately?" It's "would a teacher stake their professional reputation on showing this to a student?" That reframe changed almost every UI and feedback design decision we made.

Finally: dynamic pricing is deceptively hard to get right. Word-count-based credit scaling sounds simple until you're handling tier-specific peaks, atomic daily refresh, teacher-pool deductions, and refund logic for failed marking jobs simultaneously.

What's next for Markly Writing

Live classroom mode — teacher broadcasts a prompt, students write simultaneously, results appear on a live class dashboard as they come in
Voice feedback — text-to-speech narration of the marking report for younger students and accessibility
Exam simulation — timed, distraction-free writing mode that mirrors real test conditions, locks the interface, and auto-submits at the deadline
Richer LMS integrations — Google Classroom and Canvas sync so teachers can assign, collect, and return marked work without leaving their existing tools
Multi-language support — the framework architecture already handles different rubric systems; extending to non-English assessment frameworks is the natural next step

Built With

fastify
firebase
gemma4
google-cloud
gsap
nestjs
next.js
ollama
postgresql
prisma
radix-ui
react
recharts
stripe
tailwindcss
tanstack-query
typescript
vercel
zustand

Updates

Xiaowei Wang started this project — May 22, 2026 10:18 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.