We wanted to build fully automated crawl and scrape application that will go through ratemyprofessor.com to explore and visualize the relationship between Quality Score, Difficulty Level, and NLTK Sentiment Analysis of student comment.
What it does
We collected 351 reviews of 21 professors of UMBC CSEE Department from ratemyprofessor.com
We were able to write a little script in Python that can take in CSV file of comment to produce a new table with corresponding sentiment analysis. It included whether a comment was neutral or polar(positive, or negative). Each positive and negative polarity has a score, and they add up to 1.
Our Graphs: -Figure 1. Histogram of Quality Score given a comment was negative, positive, or neutral. -Figure 2. Scatter plot of Quality Score vs Positive Sentiment Score of all student reviews. -Figure 3. Scatter plot of Average Quality Score vs Average Difficulty Score. Each point represents a professor. -Figure 4. Average Positive Sentiment vs Average Difficulty Score. Each point represents a professor.
How I built it
-We used Python NLTK Sentiment Analysis API to analyze the comment of student and find a sentiment score and create a new table with a sentiment label and positive sentiment score. -We used Excel to create our graphs.
Challenges I ran into
-We have tried import.i.o. to crawl and import the website, but could not train it to retrieve all the necessary data. One of team members had to manually record the data point.
Accomplishments that I'm proud of
-We wrote a script that integrates NLTK API for Python to get the sentiment score of large data set of comment
What I learned
-We were able to get some insight into what we do not know, and what we can accomplish by improving on our current skill gaps specifically crawling and scraping a website.
What's next for Exploring Data from ratemyprofessor.com
- Write crawling and scraping script to get more data points automatically
- Learn how to produce data visualization using Python
- Work with more data and learn statistics to get more better insight from data