We were inspired by the vast number of publicly accessible reviews TripAdvisor offered and how this data would provide interesting insight for JetBlue. We also wanted to automate the process of extracting and analyzing data so we could use the same techniques to compare JetBlue with other airline competitors.

What it does

We scrape TripAdvisor for JetBlue reviews, starting from the most recent reviews, and saved them onto a Firebase database. Then, we used 3 methods to analyze this data and see which words correlated to positive versus negative reviews. These methods included statistical analysis, machine learning, and natural language processing. After analyzing the data, we provide what areas JetBlue can most work on and what areas they are successful at by looking at both positively and negatively associated words.

How we built it

We used selenium webdriver to scrape TripAdvisor since we wanted to enable Javascript to run on the page. We also pushed our scraped data into Google Firebase, which we used as our backend database. We then pulled data from our database and analyzed it by first calculating average ratings associated with the words, writing and using our own Average Perceptron algorithm based on the bag-of-words technique, and using and modifying the nltk NLP feature to associate sentences and words as positive or negative.

Challenges we ran into

We had to scrape using selenium rather than scrapy because we wanted Javascript to load but scrapy only download html. We had to connect to Google Firebase using Firebase Admin Python SDK, which required setting up certificates. We also had to process the data so we could apply the machine learning bag-of-words technique and also run it through the NLP algorithm.

Accomplishments that we're proud of

Accomplishment such as being able to get the machine learning model running properly and getting valid results were big milestones. We originally wanted to build a website where clients (JetBlue personnels) could search for desired reviews and whatnot, but we soon realized that TripAdvisor already existed to perform this exact functionality. Therefore, we transitioned over to building an algorithmic process that would give clients insight into the biggest attributes and keywords that their airline is doing correctly or incorrectly, according to customer feedback. Since we know that customer feedback is massively important for any business that directly interacts with consumers, especially one that is on a massive scale like JetBlue, we decided to take on 3 approaches to solving this problem. For all 3 approaches, there were some variances in our results, but we ultimately saw definitive similarities regarding certain keyboards and how negatively or positively it was used by customers, based on feedback gathered by thousands of customers. In the end, we are all proud of our team’s resilience and drive to keep on writing code to improve our algorithm and machine learning models even though we definitely saw wrong results in the beginning.

What we learned

From this project, we learned to be flexible and adaptable in time-pressuring situations. We were originally honed in on the website idea for the first half of the hackathon, but soon realized that it would be more beneficial and have more real-world impactful if we switched over to our current machine learning and algorithmic processing idea. We learned to work as a group on GitHub, resolve important merge conflicts, and most importantly, recognize that there may be several ways to tackle the same problem.

What's next for NeuralJet

A feature we want to add is to be able to see changes in customer reviews over certain time periods such as month or year. We also want to implement more scrapers from other sources such as Facebook and compare those results to the results from TripAdvisor.

Share this project: