Inspiration
We wanted to explore how holistic factors, ranging from socioeconomic status to lifestyle habits, influenced student success. Our main goal was to pinpoint the main drivers of high exam scores to better tailor resources for the UCI student body. We were particularly interested in seeing if "merit" factors (study time) or "background" factors (family income) held more weight in the final grade.
What it does
Our project is a predictive analytics tool that uses Machine Learning to identify the "Golden Window" of student effort. It doesn't just predict a score; it identifies the thresholds where student effort (like studying and attendance) yields the highest return on investment and where those returns begin to plateau.
How we built it
We utilized a multi-model approach to ensure our findings weren't just a fluke of one algorithm:
- Exploratory Data Analysis (EDA): We used Omni to visualize initial correlations.
- Linear Benchmarking: We started with OLS Regression and Lasso/Ridge Regression to find a baseline. Lasso was particularly helpful in "zeroing out" noise and confirming that Attendance and Study Hours were our most significant features.
- Ensemble Learning: To capture non-linear patterns, we implemented a Random Forest Regressor. This allowed us to move from an R-squared of 0.459 (Single Decision Tree) to a robust 0.637 (Random Forest).
- Interpretability: We used SHAP (SHapley Additive exPlanations) and Partial Dependence Plots (PDP) to "open the black box" and see exactly how many hours of study are required before a student hits diminishing returns.
Challenges we ran into
One of our biggest hurdles was underfitting. Our initial Decision Tree only explained about 46% of the variance in scores. We realized that a single tree was too "rigid" to capture the complexity of student behavior. Additionally, handling Multicollinearity was tricky. Features like "Tutoring" and "Previous Scores" often overlapped, making it hard to tell which one was truly driving the result until we used Lasso to simplify the feature set.
Accomplishments that we're proud of
We are proud of discovering the "30-Hour Plateau." While a simple linear model would suggest that studying more always equals a better grade, our Random Forest model proved that after 30 hours per week, the predictive benefit to a student's score essentially flattens.
What we learned
We learned that the type of model dictates the type of truth you find.
- OLS taught us the general trend.
- Lasso taught us what was essential.
- Random Forest taught us the reality of thresholds and diminishing returns. We also learned that in this specific dataset, active habits (Attendance) consistently outperformed passive backgrounds (Socioeconomic status) in predicting success.
What's next for Student Performance
Incentivize attendance:
- Extra Credit for attending class
Provide resources for efficient studying:
- New study methods
- 10-25 hours/week reflects consistent growth
Make school resources more accessible to students:
- Free Wi-Fi
- Peer mentor support
- Free tutoring sessions
Promote positive school culture!
Log in or sign up for Devpost to join the conversation.