Inspiration

Education is a hard topic to evaluate because there are many factors contributing to a school's success. We wanted to help schools determine which factors are the most important for the success of their students. Our metric to determine the success of students is by comparing the results of their national Math and ELA tests.

What it does

Our model compared multiple factors, like gender and ethnicity to see if there is any trend with the mean grade. This may underline the issues with specific communities and will help educators focus more on these communities. Next, we compared scores, such as leadership, trust and rigorous instruction, for 2015 and 2017 school surveys to see if these would be good predictors for school success.

How we built it

Our first step was the preprocessing of the data by merging all the different datasets we had in one single dataset combining all different features that may impact the scores. And we run our models with this feature matrix

Challenges we ran into

One of the major challenges was the preprocessing, there were a lot of missing safety values so we decided to not use it so as not to hinder the performance of the model. Similarly, the survey scores were not consistent over the years so we couldn't use it as we wanted initially, but even though we dropped a lot of them we still had enough to run a model. Another big challenge is that the distribution of the scores is very spread out, so linear models cannot perform well. With the short amount of time we had, we only used linear regression and xgboost but with more time we would like to explore other models.

Accomplishments that we're proud of

Our feature selections gave us a few features that make sense and it is a promising future for applying models on these parameters. We discovered there was a correlation between specific ethnicity/gender groups and their math and English scores.

What we learned

We learned preprocessing the data has a huge impact on the model performance and most of our time was focused on this part.

What's next for Educational DataSet

We still need to improve our models and try different methods for the regression in order to improve our prediction. Finally we want to be able to use also time series model to help schools know if their performances are increasing or not.

Share this project:

Updates