🎓 EduRisk AI — Early Student Outcome Prediction System 🚀 Inspiration

Student dropout is not just an academic metric — it is a life-changing event that impacts careers, financial stability, and mental health.

As an international student, I have personally seen how financial stress, academic pressure, and lack of support can influence a student’s trajectory.

This project was inspired by a simple question:

Can we identify at-risk students early enough to actually help them?

Instead of reacting after a student drops out, I wanted to build an AI-driven early warning system that predicts student outcomes and quantifies dropout risk after the first semester.

📊 Problem Statement

Using socio-economic data, enrollment information, and semester-level academic performance, we aim to predict:

Dropout

Enrolled

Graduate

This is a multi-class classification problem.

Formally:

𝑃 ( 𝑌 ∣ 𝑋

)

𝑃 ( Outcome ∣ Student Features ) P(Y∣X)=P(Outcome∣Student Features)

Where:

𝑌 ∈ { 𝐷 𝑟 𝑜 𝑝 𝑜 𝑢 𝑡 , 𝐸 𝑛 𝑟 𝑜 𝑙 𝑙 𝑒 𝑑 , 𝐺 𝑟 𝑎 𝑑 𝑢 𝑎 𝑡 𝑒 } Y∈{Dropout,Enrolled,Graduate}

𝑋 X includes academic, financial, demographic, and macroeconomic indicators

🛠 How I Built It 1️⃣ Data Processing

The dataset contains 4,424 students and 35 features, including:

Academic performance (semester grades, approvals)

Financial status (tuition status, debtor)

Demographics (age, nationality, parental education)

Macroeconomic indicators (GDP, unemployment rate)

I standardized column names into snake_case to prepare the data for API deployment.

2️⃣ Feature Engineering

Raw semester data is powerful but not directly interpretable. I engineered meaningful performance indicators:

Approval Rate

approved enrolled Approval Rate= enrolled approved ​

Success Ratio

approved evaluations Success Ratio= evaluations approved ​

No Evaluation Rate

without evaluations enrolled No Evaluation Rate= enrolled without evaluations ​

These derived metrics significantly improved predictive power and interpretability.

3️⃣ Modeling

I trained a multi-class XGBoost classifier using:

Tree-based ensemble learning

Probabilistic output (multi:softprob)

3-class softmax prediction

The model outputs:

📌 Predicted class (Dropout / Enrolled / Graduate)

📌 Confidence score

📌 Dropout risk score (0–100%)

📌 Probability distribution across classes

4️⃣ Risk Scoring System

Beyond prediction, I introduced a Dropout Risk Score:

Risk Score

𝑃 ( Dropout ) × 100 Risk Score=P(Dropout)×100

This transforms probabilities into actionable categories:

🟢 Low Risk (0–40%)

🟠 Medium Risk (40–70%)

🔴 High Risk (70–100%)

This makes the system usable for real university interventions.

🧠 What I Learned

Feature engineering often matters more than model complexity.

First-semester approval rate is one of the strongest predictors.

Financial indicators (tuition status, debtor flag) significantly influence dropout probability.

Clean architecture (renaming, pipeline modeling) makes deployment much easier.

I also learned how to:

Build reproducible sklearn pipelines

Structure models for backend deployment

Convert academic ML work into a production-style AI product

⚠ Challenges

  1. Multi-class imbalance

Graduates and dropouts are not evenly distributed.

  1. Feature leakage concerns

Including semester-level features requires careful framing:

This is an early-warning system after first semester, not at admission time.

  1. Deployment readiness

Column name normalization and consistent preprocessing were critical to avoid runtime errors in a web API environment.

🌍 Impact Vision

Universities could use this system to:

Identify high-risk students early

Trigger academic advising

Offer financial support interventions

Improve graduation rates

Reduce social inequality in higher education

Instead of saying:

“The student dropped out.”

We can say:

“We saw the risk early — and acted.”

🔮 Future Improvements

Add SHAP explainability dashboard

Fairness analysis across international/domestic students

Real-time API integration for institutional systems

Temporal modeling (semester-by-semester progression)

💡 Final Thought

EduRisk AI transforms raw academic data into actionable insight. It is not just a prediction model — it is a decision-support system.

Share this project:

Updates