We were inspired by a challenge to determine what the general public and customer's sentiments are about the company JetBlue. We found that there was a vast amount of information across different social media and public domain websites (such as TripAdvisor, Kayak, Yelp, etc), but there was not centralized domain to aggregate all the data. Therefore we developed CrowdSink!

What it does

CrowdSink is a website that can crawl popular airline review websites and social media platforms and determine what the customer's sentiments, general comments, and emotions are towards that company. It scrapes a website's HTML code for keywords and is able to pull out reviews, tweets, comments, etc. Once we have the raw data, we run them through the IBM-Watson Natural Language Understanding API which uses Natural Language Processing and text analysis to help us understand the data's concepts, entities, keywords, sentiment, and more. We focus primarily on the sentiment and emotional aspect of the reviews and try to build an aggregation of that data. Once we have our aggregate, CrowdSink will convey that information through the use of a WordCloud and charting the data.

Challenges I ran into

Initially our idea to "scrape" the information from popular public domains and social media platforms was to use readily available (and easily accessible) APIs. However, we found that most of the APIs required pre-checked authorization before we were given access to use them. This process usually took days and would often be limited in scope to what we would actually need. Instead we took the "hack"-y approach of manually scraping raw HTML for data that we needed. This was a challenge as none of us had any experience with such a process and having that as our starting point proved challenging.

Another challenge we ran into was putting our entire application together to build the finished product. Several group members worked on different aspects and modules of the project, both as an opportunity to learn and as a means to save time and work more efficiently. However, this led to a lot of assumptions being made about how the handle the hand-off between modules and the lack of communication further slowed down the development process when it came to putting it all together. Since this was a web application, wiring together the front end (React), back end (Node.Js) and our database (Firebase) proved to be even more challenging as the team has limited experience in this domain. This made troubleshooting an even more difficult task.

Accomplishments that I'm proud of

When we initially started this project, the team had little to no experience with web scrapping, and only a few members had some experience with web development in the Node.Js and React world. Despite facing many challenges and difficulties of learning a new stack and working in a time crunch environment, we are proud of our ability to learn as much as we did and gain the knowledge and skillsets to be able to jump start our deeper learning into the concepts and technologies we used throughout this project. I personally have never done any JavaScript heavy programming and found myself in a comfortable level of knowledge by the end of the project.

The team was also eager to make use of some Agile practices we had learned about both in class and in internship experiences. The team adhered to Scrum practices throughout the development process, hosting hourly "stand-ups" and running micro-sprints of about two to four hours (factoring in rest, sleep, and eating times). We also had a "user story" system that helped us break down the tasks that we needed to accomplish in small workable chunks that helped drive our development process.

What I learned

As previously mentioned, I had little to no experience with programming in JavaScript and in the Node.Js stack of web development, so being given the opportunity to lead our server side development was truly a profound learning experience. As well as having learned how to scrape websites for raw data without the aid of APIs.

What's next for CrowdSink

Our next order of business will be to get the application production ready and hosted on Google Cloud Services. We have the tools and the means to further expand the idea of scraping the web for customer sentiment beyond just airline companies.

Share this project: