AI Model to predict student dropouts during the pandemic
Our study will focus on coming up with an AI model to predict what students will be discontinuing their education. Please see https://data.mendeley.com/datasets/wt8g7dth8y/1 for details about the journal article and the dataset used. This project contains the code to use sentiment analysis to convert text comments to an integer. After that IBM Waston Data Studio's AutoExperiment is to develop the model.
In this study, we look at a dataset from a journal article titled "Impact of lockdown on university students’ learning process during the COVID-19 pandemic in Southern Central Bulgaria" The authors wanted to study the impact of the COVID-19 on students in Bulgaria. During the lockdown, these students were forced to attend classes remotely. Some students indicated that they will be discontinuing their education due to the impact of the pandemic. We wondered how we may be able to possibly lower this rate with the help of AI.
What it does
This AI model predicts what students will be discontinuing their education. This will help administrators intervene early and attempt to prevent these students from dropping out. The model will also be helpful if we are to face another pandemic.
How we built it
First, we uploaded our data set, of the data of the survey that the University Students in Bulgaria took. When we looked at the data we noticed that there was a free comment question - "Do you have some recommendations for improving the quality of distance learning?" - where the students could type in an answer. We decided to use Sentiment Analysis to convert it to a numeric value. We found an open-source Python library called vaderSentiment that could convert the text into an integer and decided to use it. From there, I created a Jupyter Notebook and uploaded the CSV data file. Then I loaded the CSV file to a Pandas data frame. Then I located the comment question and used the vaderSentiment to convert the text into a number and then added that numeric value back to the data frame as a new column. Then I downloaded the updated CSV file. Next, I created an AI model that can predict if a student will drop out. I decided to use IBM Watson Data Studio for this part. So first I uploaded the data file to Watson Studio. Then I added an AutoExperiment and asked it to predict if the student is going to drop out or not based on the values of the other columns. AutoExperiments will use several different algorithms and show the error for each one of them. I chose the one with the smallest error, one of the Ridge algorithms.
Challenges we ran into
We originally had some difficulty on where to start with using sentiment analysis, but after a little bit of trial and error, we were able to figure it out. We also had some difficulty with a few algorithms not running properly, lucky we were able to quickly resolve that issue as soon as we realized we had fed in an invalid integer by accident.
Accomplishments that we're proud of
Not having to start completely over as frequently as I thought we would have to.
What we learned
We learned how to properly apply and use Sentiment Analysis, we also deepened our knowledge on different algorithms and their errors, and how it may affect prediction accuracy.
What's next for Student Dropout Predictor during COVID
We hope to further improve this AI model, by possibly adding it to a website or app of some sort to make it more user-friendly.