MindStep — AI Dyslexia Early Detection & Training
Inspiration
Walking the hallways of schools in Aktobe, Kazakhstan, we kept seeing the same thing: bright, curious children being labeled as "slow learners" — not because they lacked intelligence, but because no one had the tools to see what was really happening.
The statistics hit us hard:
$$\frac{1}{5} \text{ children worldwide} = 700\text{M people with dyslexia}$$
Yet the average diagnosis age is 9–10 years — precisely when the intervention window is already closing:
| Age | Treatment Success Rate |
|---|---|
| 5–7 | 90% |
| 8–9 | 70% |
| 10+ | 45% |
Professional testing costs \$500–\$2,000 per child. For families in Kazakhstan and across Central Asia, that number is simply unreachable. 75% of at-risk children miss the window entirely.
The turning point came when we interviewed Amina Bekmuhambetova, a Speech-Language Pathologist at the Psychological-Pedagogical Correction Center in Aktobe, with 10+ years of experience and 500+ cases. She told us something we couldn't forget:
"Every month we wait, intervention gets twice as hard.
Age 6 = rewire the brain. Age 10 = teach coping."
She also revealed that children as young as 5 begin to compensate — memorizing texts instead of reading — and that teachers are physically unable to detect micro eye-movements without equipment.
That was our moment. We didn't want to write a report about the problem. We wanted to solve it.
What It Does
MindStep is a free, 5-minute, AI-powered dyslexia early detection platform for children aged 5–10. It runs entirely in the browser — no app download, no specialist required, no cost.
The assessment has four stages fused into one risk score:
1. 👁️ Eye-Tracking Assessment (45% weight)
Using WebGazer.js, the child reads a short passage while the webcam tracks their gaze. Our ensemble model extracts 86 engineered features across four categories:
- Saccades (13): regression count, amplitude, velocity
- Raw gaze (26): speed, pupil diameter, entropy, efficiency
- Fixations (11): count, duration, spatial distribution
- Task metrics (36): dwell time, re-reading rate, efficiency
The model achieves:
$$\text{AUC}{\text{syllables}} = 0.918, \quad \text{AUC}{\text{meaningful}} = 0.916, \quad \text{AUC}_{\text{pseudo}} = 0.870$$
$$\text{Validation Accuracy} = 92.9\%, \quad \text{F1-score} = 92.8\%$$
2. ✍️ Handwriting Analysis (25% weight)
The child draws a word on the screen using mouse or touchscreen. The image is sent to a clinically-prompted Gemini 2.5 Flash model that detects:
- Mirror writing ($b \rightarrow d$, $p \rightarrow q$)
- Letter spacing irregularities
- Letter formation tremors
- Orthographic errors
Results are age-calibrated using factors $\alpha_{\text{age}}$:
$$\text{score}{\text{adjusted}} = \text{score}{\text{raw}} \times \alpha_{\text{spacing}} \times \alpha_{\text{formation}} \times \alpha_{\text{orientation}}$$
Where for age 5: $\alpha = (0.7,\ 0.8,\ 0.7)$ and for age 10: $\alpha = (1.0,\ 1.0,\ 1.0)$
3. 🤖 Cognitive Chatbot (20% weight)
Five audio questions — delivered via Web Speech API — test:
$$\text{Memory Score} = (Q_1 \times 0.4 + Q_5 \times 0.6) \times 10$$
$$\text{Attention Score} = Q_3 \times 10$$
$$\text{Comprehension Score} = (Q_2 \times 0.5 + Q_4 \times 0.5) \times 10$$
Children answer by voice — no reading or typing required.
4. 🔀 Fusion Algorithm
All signals combine into a final risk score $R \in [0, 100]$:
$$R = 0.45 \cdot S_{\text{eye}} + 0.25 \cdot S_{\text{hand}} + 0.20 \cdot S_{\text{chat}} + 0.10 \cdot S_{\text{raw}}$$
Output: Low / Moderate / High risk classification with a downloadable PDF report, explainable indicators, referral guidance, and personalized exercises.
How We Built It
Frontend — React + TypeScript
- Multi-step assessment flow with real-time progress tracking
- OpenDyslexic font throughout the interface
- Web Speech API for full voice narration and speech-to-text input
- WebGazer.js integration with 5-point gaze calibration
- Canvas-based handwriting input with undo/clear/submit controls
- Responsive design tested on tablets and school computers
- Deployed on Vercel
Backend — FastAPI (Python)
- Receives handwriting canvas images as base64
- Constructs clinical-grade prompts for Gemini 2.5 Flash
- Applies age-calibration matrix before returning scores
- Lightweight, stateless, deployable in minutes
AI/ML Pipeline
- Trained on ETDD70 dataset: 630,000+ gaze coordinates, 840 CSV files, 3 reading tasks
- Pipeline:
$$\text{Raw CSV} \rightarrow \text{Feature Extraction} \rightarrow \text{VarianceThreshold} + \text{StandardScaler} \rightarrow$$ $$\text{CatBoost} + \text{XGBoost} + \text{LightGBM} \rightarrow \text{Average Probabilities} \rightarrow \text{Classification}$$
- Benchmark: Sedmidubsky et al. (SISAP 2024) achieved ~86% on ETDD70 using DTW+1NN. Our ensemble achieves 92.9% with interpretable features and no time-series alignment required.
Real-World Testing
Tested at School No. 4, Aktobe with 35 children:
$$\text{Sensitivity} = \frac{11}{12} = 91.7\%, \quad \text{Specificity} = \frac{20}{23} \approx 86.9\%$$
$$\text{Overall Accuracy} = 92.86\%, \quad \text{AUC} = 97.96\%, \quad \text{Recall} = 100\%$$
Challenges We Ran Into
🎯 Webcam Eye-Tracking Without Hardware
Consumer webcams are noisy. WebGazer.js drifts. Children move. Getting stable gaze data from a 6-year-old sitting at a school computer was genuinely hard. We solved it through aggressive feature engineering — instead of relying on raw coordinates, we built statistical aggregates robust to noise:
$$\sigma_{\text{saccade}} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(v_i - \bar{v})^2}$$
Variance, entropy, and efficiency metrics proved far more stable than raw positional data across different webcam qualities.
🧒 Designing for Ages 5–10
A UI/UX that works for a 5-year-old and a 10-year-old simultaneously is genuinely difficult. Text instructions became voice instructions. Keyboard input became voice input. We had to throw away our first two interface designs entirely after watching real children use them and get confused within 30 seconds.
⚖️ Age Calibration
A 5-year-old's handwriting looks like dyslexia by adult standards. Defining age-appropriate thresholds required combining Gemini's vision analysis with a custom calibration matrix — and validating it against real samples from Amina's clinical experience.
🔗 Multi-Modal Fusion Weights
How much should eye-tracking matter versus handwriting? We ran dozens of weight combinations and validated each against our test cohort. The final weights $(0.45, 0.25, 0.20, 0.10)$ were not guessed — they were earned.
🌐 Language & Accessibility
Supporting English, Russian, and Kazakh in the Speech API required careful voice selection and pitch/rate tuning (Pitch $= 1.2$, Rate $= 0.85$) to remain clear and calming for young children across all three languages.
Accomplishments That We're Proud Of
- 🏆 92.86% accuracy on real children — beating published academic benchmarks
- 🏫 Tested on 35 real children at School No. 4 in Aktobe — not just a demo, a real deployment
- 👩⚕️ Clinical validation from a practicing SLP with 500+ cases who said MindStep catches children she would never see
- 💸 $0 cost to families — the most important feature we shipped
- 🌍 Multi-language support in English, Russian, and Kazakh
- ♿ Full accessibility — voice narration, speech-to-text, OpenDyslexic font, touch-friendly canvas
- 📄 Downloadable PDF reports parents can bring directly to a specialist
- 💡 Outperforming Sedmidubsky et al. (SISAP 2024) by 6.9 percentage points without time-series alignment
What We Learned
Technically, we learned that interpretable features beat complex architectures when data is limited. An ensemble of gradient boosting models on well-engineered features outperformed deep learning approaches that required far more data and compute.
We learned that prompt engineering for clinical tasks is a discipline in itself. Getting Gemini to think like a speech-language pathologist — not just describe what it sees — required careful framing, structured output schemas, and validation against expert judgment.
About users, we learned that the hardest interface problems aren't technical. A child who can't read cannot navigate a text-heavy onboarding screen, no matter how good your AI is. Accessibility isn't a feature you add at the end. It's the foundation you build on.
About systems, we learned that dyslexia isn't just a reading problem — it's a system failure. Schools lack tools. Specialists are expensive. Parents don't know the signs. No single fix works. You have to address the whole pipeline: detection, reporting, referral, and support.
And perhaps most importantly — we learned that talking to experts early saves months of building the wrong thing. Amina's three insights reshaped our entire technical architecture before we wrote a single line of production code.
What's Next for MindStep
Near-term (Year 1)
- 🏫 Pilot in 5 schools across Aktobe with teacher dashboards
- 📱 Progressive Web App for offline use in low-connectivity areas
- 🇰🇿 Partner with Kazakhstan Ministry of Education for credibility and distribution
- 📊 Launch B2B school accounts at \$99/month with class-level analytics
Medium-term (Year 2)
- 🌍 Expand to CIS countries — Kyrgyzstan, Uzbekistan, Tajikistan
- 🏥 White-label version for pediatric clinics
- 🤝 Apply for UNICEF and WHO non-dilutive grants
- 📈 Target SAM of \$18M in Kazakhstan school dyslexia screening
Long-term Vision
- 🌐 30+ language support — reaching underserved communities across Africa, South Asia, and Latin America
- 🔬 Longitudinal study tracking intervention outcomes for MindStep-detected children over 3 years
- 🤖 Personalized AI training modules — not just detection, but remediation
- 🎯 The ultimate goal:
$$\text{Dyslexia screening} = \text{Vision test}$$
Done at school. In 5 minutes. For free. For every child.
"Every day without early detection is a day lost in the neuroplasticity window. MindStep closes that gap."
Built With
- catboost
- fastapi
- lightgbm
- python
- react
- typescript
- vercel
- webgazer.js
- xgboost
Log in or sign up for Devpost to join the conversation.