🎓 EduRisk AI — Early Student Outcome Prediction System 🚀 Inspiration
Student dropout is not just an academic metric — it is a life-changing event that impacts careers, financial stability, and mental health.
As an international student, I have personally seen how financial stress, academic pressure, and lack of support can influence a student’s trajectory.
This project was inspired by a simple question:
Can we identify at-risk students early enough to actually help them?
Instead of reacting after a student drops out, I wanted to build an AI-driven early warning system that predicts student outcomes and quantifies dropout risk after the first semester.
📊 Problem Statement
Using socio-economic data, enrollment information, and semester-level academic performance, we aim to predict:
Dropout
Enrolled
Graduate
This is a multi-class classification problem.
Formally:
𝑃 ( 𝑌 ∣ 𝑋
)
𝑃 ( Outcome ∣ Student Features ) P(Y∣X)=P(Outcome∣Student Features)
Where:
𝑌 ∈ { 𝐷 𝑟 𝑜 𝑝 𝑜 𝑢 𝑡 , 𝐸 𝑛 𝑟 𝑜 𝑙 𝑙 𝑒 𝑑 , 𝐺 𝑟 𝑎 𝑑 𝑢 𝑎 𝑡 𝑒 } Y∈{Dropout,Enrolled,Graduate}
𝑋 X includes academic, financial, demographic, and macroeconomic indicators
🛠 How I Built It 1️⃣ Data Processing
The dataset contains 4,424 students and 35 features, including:
Academic performance (semester grades, approvals)
Financial status (tuition status, debtor)
Demographics (age, nationality, parental education)
Macroeconomic indicators (GDP, unemployment rate)
I standardized column names into snake_case to prepare the data for API deployment.
2️⃣ Feature Engineering
Raw semester data is powerful but not directly interpretable. I engineered meaningful performance indicators:
Approval Rate
approved enrolled Approval Rate= enrolled approved
Success Ratio
approved evaluations Success Ratio= evaluations approved
No Evaluation Rate
without evaluations enrolled No Evaluation Rate= enrolled without evaluations
These derived metrics significantly improved predictive power and interpretability.
3️⃣ Modeling
I trained a multi-class XGBoost classifier using:
Tree-based ensemble learning
Probabilistic output (multi:softprob)
3-class softmax prediction
The model outputs:
📌 Predicted class (Dropout / Enrolled / Graduate)
📌 Confidence score
📌 Dropout risk score (0–100%)
📌 Probability distribution across classes
4️⃣ Risk Scoring System
Beyond prediction, I introduced a Dropout Risk Score:
Risk Score
𝑃 ( Dropout ) × 100 Risk Score=P(Dropout)×100
This transforms probabilities into actionable categories:
🟢 Low Risk (0–40%)
🟠 Medium Risk (40–70%)
🔴 High Risk (70–100%)
This makes the system usable for real university interventions.
🧠 What I Learned
Feature engineering often matters more than model complexity.
First-semester approval rate is one of the strongest predictors.
Financial indicators (tuition status, debtor flag) significantly influence dropout probability.
Clean architecture (renaming, pipeline modeling) makes deployment much easier.
I also learned how to:
Build reproducible sklearn pipelines
Structure models for backend deployment
Convert academic ML work into a production-style AI product
⚠ Challenges
- Multi-class imbalance
Graduates and dropouts are not evenly distributed.
- Feature leakage concerns
Including semester-level features requires careful framing:
This is an early-warning system after first semester, not at admission time.
- Deployment readiness
Column name normalization and consistent preprocessing were critical to avoid runtime errors in a web API environment.
🌍 Impact Vision
Universities could use this system to:
Identify high-risk students early
Trigger academic advising
Offer financial support interventions
Improve graduation rates
Reduce social inequality in higher education
Instead of saying:
“The student dropped out.”
We can say:
“We saw the risk early — and acted.”
🔮 Future Improvements
Add SHAP explainability dashboard
Fairness analysis across international/domestic students
Real-time API integration for institutional systems
Temporal modeling (semester-by-semester progression)
💡 Final Thought
EduRisk AI transforms raw academic data into actionable insight. It is not just a prediction model — it is a decision-support system.
Log in or sign up for Devpost to join the conversation.