Inspiration

Weather impacts everyone, and can change our daily plans at the drop of a hat. It's clear how weather can affect our outdoor activities (e.g. bring an umbrella). However, it's less clear how it can affect our collective emotions. This project investigates this topic through the use of social media content (e.g. tweets), location data, weather data, and tone data from IBM Watson's Tone Analyzer.

What it does

This project has two main parts. The first part is to gather the data required to better understand the topic at hand. Tweets from the following cities was collected into CSV files: Boston, DC, NYC, Seattle, and SF. Each tweet was processed to gather data on emotional tone: Openness, Conscientiousness, and Agreeableness. The timestamp on each tweet was then matched against historical records to get weather data: Cloud Cover, Precipitation, and Temperature. After gathering the data, the second main part of analyzing the data begins. The data gets loaded into IBM Spark and we can use various tools (e.g. pandas, matplotlib) to further process, visualize, and understand patterns in the data.

How I built it

Python was the main programming language I used. It was used to connect with various APIs to gather the various data, and also I used Python (instead of Scala) for the analysis in IBM Spark.

Challenges I ran into

Historical weather data can be difficult to access. As of the time of this writing, IBM's Weather Insights feature only provided the last 24 hours of weather data. I was able to find an alternative source of weather data at http://us.worldweatheronline.com/api/docs/historical-weather-api.aspx to fill this need.

It was time-consuming to get familiar with how to use various APIs and tools. For example, to collect hundreds of tweets from different cities based on longitude/latitude data, and to set up IBM Spark to run the analysis in quick and simple ways. However, once I got familiar with the frameworks, the actual processing happened quickly.

Accomplishments that I'm proud of

Mainly I'm proud of just overcoming a bunch of hurdles and unknowns to submit this project. The data gathering tool I created ended up being very flexible, so it's easy to gather data from various geolocations and date ranges. By the way, I did a lot of research on rainy days while in SF. :)

What I learned

Different tools have their own quirks, drawbacks, and advantages. For example, I actually started with a Javascript/Node script originally for data gathering, but then switched to Python. I also learned that there's a TON of data out there that could be analyzed to try and discover interesting patterns (on the flip side, this can make it hard to prioritize hypotheses to test).

What's next for City Weather Tones

I hope that others can also learn something from this project submission. I had never tried Spark before this project, nor had I used the IBM Watson Tone Analyzer and other tools. These tools will come in handy for future project iterations!

Share this project:

Updates