Sweet Talkers | NLP Final Project

Raymond Yeo posted an update — Nov 29, 2021 05:05 PM EST

Project Introduction

We are building a classification model that estimates course and professor ratings based on Critical Review feedback and summaries. We are interested in finding whether the way we write feedback and evaluations accurately reflects the scores that we give the classes we are taking. We hope this tool will provide further insight into how courses have perfomed based on their written evaluations rather than just the multiple choice responses. We hope that this projects acts as a starting point for something that can be extended upon, and that written evaluations which are often not even read can be made into a more useful mode.

We were intriuged by the language processing portion of the class and wanted to explore more about it.

Challenges

Our greatest challenge so far has been building our classification model. Language classification is not something that we directly tackled during class or lectures, so we've had to research methods online. We've tried our best to take a systematic approach to tackling this problem.

Insights

Data

We have been able to preprocess our data and split it into testing and training sets. There are 4230 total course reviews that we can work with.

Model

To start off, we decided to build our model as a two-label classification model. To do that we had to change how we preprocessed our data. Instead of processing it such that the average course rating is a continuous number between 1-5, we modelled it such that the review should be classified as either positive or negative, with the following classification:

if(courseavg > 3.5):
    return 1 #positive result
else: 
    return 0 #negative result

We then split our dataset 0.25 train, 0.75 test using sklearn and its train_test_split function.

Figure 1:

Figure 2:

Figure 3:

Model accuracy: 78%

We were able to build a reasonably accurate model that classifies between positive and negative reviews, although based on Figure 3, it seems as though there is a substantial amount of overfitting after the 15th epoch. To remedy this we will try to build a more accurate model.

Plan

We believe we are on track for this project and have been allocating our time properly.

Our next step is to use an LSTM to make the simplified model explained in the Insights section more comprehensive. This will likely take up most of our efforts in the coming week.

Due to time constraints, it does not look like we will be able to create a second model that classifies professor ratings. We would rather allocate our time toward improving our course rating model.

Log in or sign up for Devpost to join the conversation.