There is no such data-set available which can be used to do aspect based analysis of the user reviews. We wanted to make something which helps companies get detailed analysis of reviews with respect to various aspects of their service and help them improve based on it.

What it does

Our algorithm works in two steps: 1) We create aspect vectors' into 3 dimensional space and given a review as a data point we break it into vector and project it into aspect vector space to find most associated aspect to it. Here we used "customer service","punctuality","cancellation","comfort" and "miscellaneous" as aspects for JetBlue airlines review.

2) After the clustering of aspects we do sentimental analysis of the review and categorize it as either "positive","negative" or "neutral". It can be used to get insights on which aspects are good and which aspects needs improvement.

How I built it

First we scraped review data from various sources such as Twitter, Instagram, TripAdvisor, Reddit,,etc. In total we collected around 25000 reviews.

Then we used Google's Natural Language API do sentimental analysis of reviews and categorize it as either "positive","negative" or "neutral".

For each review we used cosine similarity between all aspect vectors and review vector and get closest vector to associate the review with that aspect. That way we know where given review is about customer service, comfort, cancellation or punctuality. It is very easy to add new aspects into our application. After that we analyzed sentiment of the review to get information about user's experience with that aspect.

Finally, we build front end using React to display results of our algorithm.

Challenges I ran into

As there is no readily available data-set about airline reviews, It was difficult to collect such amount of review data-set which can give reasonable performance. So first challenge was gathering data.

To get accurate results we needed aspect vectors which strongly represented the aspects which we wanted to learn. After that we had to experiment with various distance functions between vectors to see which one gave most reasonable results and we settled on cosine similarity function.

Then combining the data from sentimental analysis of reviews using Google's Natural Language API and results of our aspect association algorithm was a bit of a challenge as well as getting a front end dashboard that can visualize the results as we wanted.

Accomplishments that I'm proud of

Getting highly functional aspect predictions in unsupervised manner.

What I learned

Thinking through how to implement a data science project end-to-end from data collection to data cleaning to modelling and visualization of results.

What's next for DeepFly

This project can be easily extended for other kinds of aspects and for reviews of any kind of services.

Share this project: