Inspiration

We aimed to try and target a specific hypothesis to explore in regards to Jet Blue's customer satisfaction. Having recently had flights canceled due to poor weather, we wanted to explore how an airline's delays or cancellations affected customer experience. We wanted to be able to quantify how much a delay affects customer satisfaction vs a cancellation. This metric could then be used as input into an airline's real-time decision support system to allow more effective decision making.

What it does

We scraped over 200k tweets and 175k Tripadvisor reviews and then ran it through Google's sentiment analysis API to get a normalized metric for JetBlue's customer satisfaction overtime on a day to day basis. We then wrote custom flight statistics scraping software to get the cancellations and delays for each carrier on a daily basis. We then did data analysis to determine how cancellations and delays affected customer satisfaction and wrote a simple front end that allows querying of Jet Blue customer tweets for specific keywords.

How we built it

Our whole project is written in Python. We used Google Firestore to store all of our scraped data, and used Selenium and PhantomJS for our custom flight statistics data acquisition. We also used Google Cloud Instances to run our scraping so we were not bombarding YHack's wifi network. We used Google's Natural Language Processing libraries to do Sentiment Analysis on our scraped text

Challenges we ran into

The only aggregated flight statistics we could initially find were from the FAA and were aggregated on a monthly basis. This was just not enough granularity, so we had to write our own custom flight statistics scraping software that ran into a bunch of issues due to content being generated dynamically over javascript rather than being hardcoded into the HTML.

Accomplishments that we are proud of

Our sheer volume of data is impressive. Our high-performance Google Cloud instances with low latency allowed us to scrape a lot more data than we initially thought was possible. Also, our flight delay and cancellation statistics are a pretty uniquely crafted data source

What we learned

How to build high-performance data pipelines and scraping systems. How to build web apps with Flask in Python.

What's next for SSS — Sentiment Support System

As we collect more and more data, our statistical significance and observed trends improve in strength. We currently can only get flight information for the past 60 days, but with better we would be able to solidify our models even further. We would also love to do more fine-tuned analysis with keywords relating to specific parts of Jet Blue's customers experience.

Built With

Share this project:

Updates