Deep Learning Final Project

NLP Course Review Prediction Model

Building a model that predicts course ratings based on Critical Review language data

Tags: NLP, Tensorflow, python

Poster: https://docs.google.com/presentation/d/1yBslfjAz3vv7DJ94CogwvxfABAfD7GNU8fyVaGPGPHQ/edit?usp=sharing

Final Reflection: https://docs.google.com/document/d/1E-heY-2DSKG8mOehzU5YepowNYiMriYEZ20QfBknnRo/edit?usp=sharing

Who We Are

  • Geireann Lindfield Roberts (glindfie)
  • Raymond Yeo (ryeo)

Project Introduction

We are building a classification model that estimates course ratings based on Critical Review feedback and summaries. We are interested in finding whether the way we write feedback and evaluations accurately reflects the scores that we give the classes we are taking. We hope this tool will provide further insight into how courses have perfomed based on their written evaluations rather than just the multiple choice responses. We hope that this projects acts as a starting point for something that can be extended upon, and that written evaluations which are often not even read can be made into a more useful mode.

We were intriuged by the language processing portion of the class and wanted to explore more about it.

Related Work

As far as we are aware, there is nothing that has been done in the same regard with Critical Review's data. There is however some related NLP work that has been done on detecting the performance based on reviews.

Data

We have been in touch with the Critical Review team to get our data. While are still communicating and preparing our dataset, we will likely be utilizing two datasets in the following formats:

tally.csv

  • edition (e.g. 2021.2022.1)
  • department code (e.g. CSCI)
  • course code (e.g. 1951F)
  • average course rating (e.g. 3.5)
  • average professor rating (e.g. 3.5)

review.csv

  • edition (e.g. 2021.2022.1)
  • department code (e.g. CSCI)
  • course code (e.g. 1951F)
  • review text

Our preprocessing will involve tokenizing our input texts, and accounting for any data integrity problems. We will also merge both of the data sets on course code such that we can access the average course rating and average professor rating given the review text. For any additional data that Critical Review does not have readily accessible, we plan to ask for permission to web scrape it.

Methodology

  1. Pre-process and prepare the data such that it is compiled into a single CSV file and sufficiently prepared to be inputted into the model.
  2. Utilize language modeling deep learning techniques to train our model. We will likely use an embedding matrix, coupled with techniques seen in LSTM and Word2Vec models.
  3. When testing, we will round up on our labels. What we mean by this is that a course rating of 3.79 would be considered a 4. This will help gauge our model accuracy, and allow the model to constain its possible outputs to just 5 integer values (1,2,3,4,5). Our training data will consist of Critical Review data from 2010 (when they switched the way Critical Review feedback was done) until 2019.
  4. We will then test the model by using it on the feedback over the last two years.

Metrics

Model accuracy would be a good metric to gauge this model's performance.

Base Goal: Perform better than random guessing. This means our model accuracy would be greater than 20%.

Target Goal: Accuracy > 30%

Stretch Goal: Accuracy > 50%

We will reassess our goals after we begin training our modeling and gauging its performance.

Ethics

Deep Learning serves as a good approach to this problem, as course and professor ratings are dependent on a variety of factors.

We understand that an unintended result of this model is that we are implictly evaluating the effectiveness of Critical review summaries. Our goal here is not to see how "good" a course review is, but rather to evaluate to what extent a qualitative metric like a holistic course review can be used to predict quantiative metrics.

Critical Review and students and staff at Brown are the main stakeholders in the problem that we are trying to solve. We hope that our metrics will support Critical Review, and help them develop and create more useful quantifications of written feedback as well as numerical feedback. We have been in touch with Critical Review and have their support with regards to this matter.

If our model were to be inaccurate and give incorrect results this would negatively affect the reputation of Critical Review, and therefore would not help them provide the feedback to course staff and students. For that reason it is important that we ensure the accuracy of our mode.

Division of labor

Ray + Geireann = 100% of work = 100% sweet talkers. We have not decided on fixed work distributions yet.

Built With

Share this project:

Updates

posted an update

Project Introduction

We are building a classification model that estimates course and professor ratings based on Critical Review feedback and summaries. We are interested in finding whether the way we write feedback and evaluations accurately reflects the scores that we give the classes we are taking. We hope this tool will provide further insight into how courses have perfomed based on their written evaluations rather than just the multiple choice responses. We hope that this projects acts as a starting point for something that can be extended upon, and that written evaluations which are often not even read can be made into a more useful mode.

We were intriuged by the language processing portion of the class and wanted to explore more about it.

Challenges

Our greatest challenge so far has been building our classification model. Language classification is not something that we directly tackled during class or lectures, so we've had to research methods online. We've tried our best to take a systematic approach to tackling this problem.

Insights

Data

We have been able to preprocess our data and split it into testing and training sets. There are 4230 total course reviews that we can work with.

Model

To start off, we decided to build our model as a two-label classification model. To do that we had to change how we preprocessed our data. Instead of processing it such that the average course rating is a continuous number between 1-5, we modelled it such that the review should be classified as either positive or negative, with the following classification:

if(courseavg > 3.5):
    return 1 #positive result
else: 
    return 0 #negative result

We then split our dataset 0.25 train, 0.75 test using sklearn and its train_test_split function.

Figure 1:

Figure 2:

Figure 3:

Model accuracy: 78%

We were able to build a reasonably accurate model that classifies between positive and negative reviews, although based on Figure 3, it seems as though there is a substantial amount of overfitting after the 15th epoch. To remedy this we will try to build a more accurate model.

Plan

We believe we are on track for this project and have been allocating our time properly.

Our next step is to use an LSTM to make the simplified model explained in the Insights section more comprehensive. This will likely take up most of our efforts in the coming week.

Due to time constraints, it does not look like we will be able to create a second model that classifies professor ratings. We would rather allocate our time toward improving our course rating model.

Log in or sign up for Devpost to join the conversation.