MindStep — AI Dyslexia Early Detection & Training

Inspiration

Walking the hallways of schools in Aktobe, Kazakhstan, we kept seeing the same thing: bright, curious children being labeled as "slow learners" — not because they lacked intelligence, but because no one had the tools to see what was really happening.

The statistics hit us hard:

$$\frac{1}{5} \text{ children worldwide} = 700\text{M people with dyslexia}$$

Yet the average diagnosis age is 9–10 years — precisely when the intervention window is already closing:

Age	Treatment Success Rate
5–7	90%
8–9	70%
10+	45%

Professional testing costs \$500–\$2,000 per child. For families in Kazakhstan and across Central Asia, that number is simply unreachable. 75% of at-risk children miss the window entirely.

The turning point came when we interviewed Amina Bekmuhambetova, a Speech-Language Pathologist at the Psychological-Pedagogical Correction Center in Aktobe, with 10+ years of experience and 500+ cases. She told us something we couldn't forget:

"Every month we wait, intervention gets twice as hard.
Age 6 = rewire the brain. Age 10 = teach coping."

She also revealed that children as young as 5 begin to compensate — memorizing texts instead of reading — and that teachers are physically unable to detect micro eye-movements without equipment.

That was our moment. We didn't want to write a report about the problem. We wanted to solve it.

What It Does

MindStep is a free, 5-minute, AI-powered dyslexia early detection platform for children aged 5–10. It runs entirely in the browser — no app download, no specialist required, no cost.

The assessment has four stages fused into one risk score:

1. 👁️ Eye-Tracking Assessment (45% weight)

Using WebGazer.js, the child reads a short passage while the webcam tracks their gaze. Our ensemble model extracts 86 engineered features across four categories:

Saccades (13): regression count, amplitude, velocity
Raw gaze (26): speed, pupil diameter, entropy, efficiency
Fixations (11): count, duration, spatial distribution
Task metrics (36): dwell time, re-reading rate, efficiency

The model achieves:

$$\text{AUC}{\text{syllables}} = 0.918, \quad \text{AUC}{\text{meaningful}} = 0.916, \quad \text{AUC}_{\text{pseudo}} = 0.870$$

$$\text{Validation Accuracy} = 92.9\%, \quad \text{F1-score} = 92.8\%$$

2. ✍️ Handwriting Analysis (25% weight)

The child draws a word on the screen using mouse or touchscreen. The image is sent to a clinically-prompted Gemini 2.5 Flash model that detects:

Mirror writing ($b \rightarrow d$, $p \rightarrow q$)
Letter spacing irregularities
Letter formation tremors
Orthographic errors

Results are age-calibrated using factors $\alpha_{\text{age}}$:

$$\text{score}{\text{adjusted}} = \text{score}{\text{raw}} \times \alpha_{\text{spacing}} \times \alpha_{\text{formation}} \times \alpha_{\text{orientation}}$$

Where for age 5: $\alpha = (0.7,\ 0.8,\ 0.7)$ and for age 10: $\alpha = (1.0,\ 1.0,\ 1.0)$

3. 🤖 Cognitive Chatbot (20% weight)

Five audio questions — delivered via Web Speech API — test:

$$\text{Memory Score} = (Q_1 \times 0.4 + Q_5 \times 0.6) \times 10$$

$$\text{Attention Score} = Q_3 \times 10$$

$$\text{Comprehension Score} = (Q_2 \times 0.5 + Q_4 \times 0.5) \times 10$$

Children answer by voice — no reading or typing required.

4. 🔀 Fusion Algorithm

All signals combine into a final risk score $R \in [0, 100]$:

$$R = 0.45 \cdot S_{\text{eye}} + 0.25 \cdot S_{\text{hand}} + 0.20 \cdot S_{\text{chat}} + 0.10 \cdot S_{\text{raw}}$$

Output: Low / Moderate / High risk classification with a downloadable PDF report, explainable indicators, referral guidance, and personalized exercises.

How We Built It

Frontend — React + TypeScript

Multi-step assessment flow with real-time progress tracking
OpenDyslexic font throughout the interface
Web Speech API for full voice narration and speech-to-text input
WebGazer.js integration with 5-point gaze calibration
Canvas-based handwriting input with undo/clear/submit controls
Responsive design tested on tablets and school computers
Deployed on Vercel

Backend — FastAPI (Python)

Receives handwriting canvas images as base64
Constructs clinical-grade prompts for Gemini 2.5 Flash
Applies age-calibration matrix before returning scores
Lightweight, stateless, deployable in minutes

AI/ML Pipeline

Trained on ETDD70 dataset: 630,000+ gaze coordinates, 840 CSV files, 3 reading tasks
Pipeline:

$$\text{Raw CSV} \rightarrow \text{Feature Extraction} \rightarrow \text{VarianceThreshold} + \text{StandardScaler} \rightarrow$$ $$\text{CatBoost} + \text{XGBoost} + \text{LightGBM} \rightarrow \text{Average Probabilities} \rightarrow \text{Classification}$$

Benchmark: Sedmidubsky et al. (SISAP 2024) achieved ~86% on ETDD70 using DTW+1NN. Our ensemble achieves 92.9% with interpretable features and no time-series alignment required.

Real-World Testing

Tested at School No. 4, Aktobe with 35 children:

$$\text{Sensitivity} = \frac{11}{12} = 91.7\%, \quad \text{Specificity} = \frac{20}{23} \approx 86.9\%$$

$$\text{Overall Accuracy} = 92.86\%, \quad \text{AUC} = 97.96\%, \quad \text{Recall} = 100\%$$

Challenges We Ran Into

🎯 Webcam Eye-Tracking Without Hardware

Consumer webcams are noisy. WebGazer.js drifts. Children move. Getting stable gaze data from a 6-year-old sitting at a school computer was genuinely hard. We solved it through aggressive feature engineering — instead of relying on raw coordinates, we built statistical aggregates robust to noise:

$$\sigma_{\text{saccade}} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(v_i - \bar{v})^2}$$

Variance, entropy, and efficiency metrics proved far more stable than raw positional data across different webcam qualities.

🧒 Designing for Ages 5–10

A UI/UX that works for a 5-year-old and a 10-year-old simultaneously is genuinely difficult. Text instructions became voice instructions. Keyboard input became voice input. We had to throw away our first two interface designs entirely after watching real children use them and get confused within 30 seconds.

⚖️ Age Calibration

A 5-year-old's handwriting looks like dyslexia by adult standards. Defining age-appropriate thresholds required combining Gemini's vision analysis with a custom calibration matrix — and validating it against real samples from Amina's clinical experience.

🔗 Multi-Modal Fusion Weights

How much should eye-tracking matter versus handwriting? We ran dozens of weight combinations and validated each against our test cohort. The final weights $(0.45, 0.25, 0.20, 0.10)$ were not guessed — they were earned.

🌐 Language & Accessibility

Supporting English, Russian, and Kazakh in the Speech API required careful voice selection and pitch/rate tuning (Pitch $= 1.2$, Rate $= 0.85$) to remain clear and calming for young children across all three languages.

Accomplishments That We're Proud Of

🏆 92.86% accuracy on real children — beating published academic benchmarks
🏫 Tested on 35 real children at School No. 4 in Aktobe — not just a demo, a real deployment
👩‍⚕️ Clinical validation from a practicing SLP with 500+ cases who said MindStep catches children she would never see
💸 $0 cost to families — the most important feature we shipped
🌍 Multi-language support in English, Russian, and Kazakh
♿ Full accessibility — voice narration, speech-to-text, OpenDyslexic font, touch-friendly canvas
📄 Downloadable PDF reports parents can bring directly to a specialist
💡 Outperforming Sedmidubsky et al. (SISAP 2024) by 6.9 percentage points without time-series alignment

What We Learned

Technically, we learned that interpretable features beat complex architectures when data is limited. An ensemble of gradient boosting models on well-engineered features outperformed deep learning approaches that required far more data and compute.

We learned that prompt engineering for clinical tasks is a discipline in itself. Getting Gemini to think like a speech-language pathologist — not just describe what it sees — required careful framing, structured output schemas, and validation against expert judgment.

About users, we learned that the hardest interface problems aren't technical. A child who can't read cannot navigate a text-heavy onboarding screen, no matter how good your AI is. Accessibility isn't a feature you add at the end. It's the foundation you build on.

About systems, we learned that dyslexia isn't just a reading problem — it's a system failure. Schools lack tools. Specialists are expensive. Parents don't know the signs. No single fix works. You have to address the whole pipeline: detection, reporting, referral, and support.

And perhaps most importantly — we learned that talking to experts early saves months of building the wrong thing. Amina's three insights reshaped our entire technical architecture before we wrote a single line of production code.

What's Next for MindStep

Near-term (Year 1)

🏫 Pilot in 5 schools across Aktobe with teacher dashboards
📱 Progressive Web App for offline use in low-connectivity areas
🇰🇿 Partner with Kazakhstan Ministry of Education for credibility and distribution
📊 Launch B2B school accounts at \$99/month with class-level analytics

Medium-term (Year 2)

🌍 Expand to CIS countries — Kyrgyzstan, Uzbekistan, Tajikistan
🏥 White-label version for pediatric clinics
🤝 Apply for UNICEF and WHO non-dilutive grants
📈 Target SAM of \$18M in Kazakhstan school dyslexia screening

Long-term Vision

🌐 30+ language support — reaching underserved communities across Africa, South Asia, and Latin America
🔬 Longitudinal study tracking intervention outcomes for MindStep-detected children over 3 years
🤖 Personalized AI training modules — not just detection, but remediation
🎯 The ultimate goal:

$$\text{Dyslexia screening} = \text{Vision test}$$

Done at school. In 5 minutes. For free. For every child.