We noticed that airline companies like JetBlue were having trouble keeping up with the constant flow of social media, both positive and negative, and wanted to make their lives easier.

What it does

We saw that there was no platform that aggregated social media platforms into a concise menu to explore their actual public perception. We pull data from multiple social media platforms including Twitter and Instagram and simplify it to easy to understand graphs.

How we built it

We built this app using a combination of Python and various Google Cloud products. A Django server is automatically deployed on commit using the Google Cloud Build to the Google Cloud App Engine. This webserver keeps track of the various scrapers that we run to keep the Google Data Studio page updated using the Google Cloud Big Query as a data store. As each scraper is run, it is passed through the Google Cloud Language API to analyze for the post / comment's sentiment and subject.

Challenges we ran into

Getting the various scrapers to output consistent, usable data was a real challenge since people often do not always use the proper hashtags / spelling when on social media. Google Cloud Build had no easy examples to follow and would often start recursively starting builds for some reason. We initially we going to use Google Cloud Big Table and use that as a data source, but it is really not suited for small projects and Big Query works just as well. Using Google's Machine Vision API was a challenge to setup.

Accomplishments that we're proud of

Connecting the data from the parser to the Google Data Studio and having it look great. Actually using multiple Google Cloud products and having them all work.

What we learned

We all learned a lot about various methods of scraping and proper data storage techniques. Additionally we learned of using Google Cloud to help automate a large portion of our data & build workflow as well as helping build our visualization platform.

What's next for Scrape It All

Continue to scrape even more platforms and correlate these with other datasets (customer services, weather, financial, etc...) to better analyze customer behavior and how it's reflected on social media. Build a realtime tracker across all well-known platforms to instantly address upset customers.

Built With

  • flask
  • google-cloud-app-engine
  • google-cloud-big-query
  • google-cloud-build
  • google-cloud-data-studio
  • instagram-scraper
  • python
  • twitter
Share this project: