Figuring out the customer sentiment of a company is no easy task; from a lack of labeled data to ambiguity in language, there is much room for interpretation when it comes to sentiment analysis. We decided to take this challenge head-on, try to figure out what customers truly think about JetBlue, and learn a ton along the way.
What it does
Our project takes in Tweet data from Twitter relating to JetBlue (although theoretically it doesn't have to), cleans and filters it, and then runs sentiment analysis on all remaining Tweets. It then combines the sentiment for a given Tweet with its popularity and overall public appeal (retweets and likes), generating an overall weighted public sentiment score. After doing this for all the data, it takes a fraction of the most-positive data and a fraction of the most-negative data for comparison. Using a unigram, bigram, and trigram model, it figures out trends in common customer sentiment for different reviews. From this data, conclusions can be drawn regarding what specific events or actions cause customers to have a positive sentiment for JetBlue, and what diminishes or removes this sentiment. We additionally present sentiment vs time comparisons for JetBlue over the last year, which give some more insight into the trends of JetBlue's public perception.
How we built it
We used Twint, an advanced Twitter scraper for Python, to collect a large number of tweets relating to JetBlue in some fashion; by running this for hours upon hours, we managed to retrieve approximately 270,000 Tweets. From there, we had to clean our data, removing extraneous reviews that didn't actually have to do with JetBlue or reviews from sources not deemed reputable; we were left with around 66,000 tweets. After this filtering, we ran VADER's sentiment analysis on each remaining Tweet, giving each one a value between -1 and 1 which represents how negative or positive, respectively, the Tweet was. We then used a model we developed, relating sentiment for a given Tweet with its number of retweets and likes, to estimate the overall public perception given the Tweet. From there, we split the data according to sentiment, only considering the 20% most positive and 20% most negative; the other Tweets lie too close to the middle ground to reliably retrieve information from. After separating the data, were able to create a respective unigram, bigram, and trigram model for both of the positive and negative data. By then analyzing the resulting models and comparing how they differed, we were able to determine what caused negative sentiment Tweets and what caused positive sentiment Tweets. This analysis helped us conclude what recommendations we have for JetBlue in terms of what they should do better, as well as what they seem to be consistently doing very well already. Additionally, we used Matplotlib to visualize multiple sentiment vs time graphs over the last year, which allowed us to see a general trend of how the public views JetBlue and what causes this view to specifically be positive or negative.
Challenges we ran into
The biggest challenge we ran into was definitely the formulation of such an open-ended problem into a tangible, approachable task. We spent a lot of time and energy theorizing, talking about models and procedures, experimenting with the best way to implement things, etc. We were both quite new to this type of open-ended data-related problem, and that made for a unique challenge.
Accomplishments that we're proud of
We're extremely proud that we managed to take on such a large task and come to a pretty conclusive result backed by substantial evidence. We are also proud of our creativity in model development and selection; both tasks were pretty new to of us, and we believe that our development process and problem formulation fit the task quite well.
What we learned
Given that neither of us have taken on a challenge like this before, we learned an absolute ton about data scraping, data analysis, model developement, and natural language processing.
What's next for JetBlue Customer Sentiment
There's a surprising additional use for our project! While it was created with the idea of figuring out customer sentiment for JetBlue specifically, we discovered along the way that there was no limit to what we're analyzing. If we had large amount of data regarding another topic, whether it be a person, company, food, etc., we could similarly analyze the sentiment of that with some minor changes to our code. We hope that somebody can make use out of this in the future!
Log in or sign up for Devpost to join the conversation.