The Rice course evaluation platform is decades old and is no longer being used the way it was intended to. The registrar initially planned for it to be a platform primarily for students to evaluate instructors, and secondarily to help other students glean information about courses. Instead, it became a student-to-student communication platform, and both goals were compromised.
Because it is not being used correctly, the current fields are not helpful. We decided to solve this by creating a pipeline that empowers the user to play with, analyze, and draw conclusions from over half a million course evaluations, so that the student can learn about courses, the instructor can learn about students, and the registrar can learn about professors.
What it does
We created 3 distinct tools to glean new information from course evaluations. The first uses a unigram and bigram multivariate-naive-Bayesean model to perform sentiment analysis on course evaluations and extracts student happiness and teaching effectiveness for each class. From this, it was determined that student happiness is not always correlated with the amount learned, and thus instructor effectiveness should consider this factor. The second identifies helpful course evaluations and displays them for easy viewing. The third uses a novel way to represent course evaluations, word flows. In a word flow, words associated with a negative sentiment will appear on the left side of the spectrum, while words with a positive sentiment will be on the right. Instructors can compare word flows across semesters to more easily see how changes to the course are reflected in students.
Challenges we ran into
We encountered lots of challenges along the way, mostly to do with training a Naive Bayesean model, selecting the proper keywords to create a good decision boundary, and transferring the model to our unlabeled data.
After lots of different data sets and models, we finally settled on a data set from Coursera with student evaluations and a unigram multivariate Native Bayesean model, transferred onto our course evaluation data after achieving over 70% accuracy on the testing data. This was acceptable because the nature of the two sets of reviews are substantially different, and the keywords were exclusively chosen from our data set in order to increase accuracy on that front.
We learned how best to select a dataset and model to facilitate reusability while prioritizing speed and accuracy and learned how to perform NLP data pre processing and cleaning, as well as transfer a model between data sets successfully.