The Covid-19 pandemic shook the world to its foundations, causing unimagined turmoil and uncertainty affecting many aspects of our daily lives. To find out how the society feels at a certain moment in time is of great interest for many - from policy makers who want to make the 'right' decisions over social entrepreneurs who want to improve the welfare of the society to reporters who want to cover the most important topics of public interest.
Therefore, the big challenge is: How can we find out the current state of the emotional well-being of society, across all demographics?
What it does
We combine several data sources from public APIs to provide an accurate description of how the whole of society thinks and feels. Textual data from social media, on the one hand, explicitly reveals opinions, doubts and reactions. Music, on the other hand, in textual as well as acoustical form captures states of mind and emotions in several dimensions that can be quantified and yield powerful predictors of overall sentiment and preferences.
An interactive map of Switzerland plots the most recent (past 7 days) geo-tagged English/German Tweets that have been classified based on their sentiment (red: low sentiment, sadness; green: positive, happiness) over time. Free text search allows to filter Tweets related to Covid-19.
Thanks to our extendable modular architecture the application can be expanded and its self-maintained data base / webscrapers make it a valuable tool for users such as journalists.
How we built it
We currently use data from Spotify and Twitter and validated the possibility to use data from YouTube and Swisscom.
Music streaming has overtaken other kinds of music distribution in the recent years. 55% of Spotify's user base are millennials from all social classes (Source). Spotify publishes the song streaming charts on the country level on spotifycharts.com. Academic research shows that there exists a clear correlation between a person's mood and the type of music they are listening to (Source). Furthermore, we used the Musixmatch API to retrieve lyrics by songs found in the daily charts for Switzerland.
Twitter is one of most well-known and largest social networks, not only but also because of the enthusiasm with which the current American president Donald J. Trump uses it. It has a long history of being used for trend recognition - both in an academic (example) and medial setting. Using Twitter in addition to other indicators proves to be stable and provides valuable sentiment insights.
Future work: Swisscom
Swisscom is the largest telecommunications provider in Switzerland, and covers the larger part of the mobile communication market in the country. Developers can use Swisscom's API to access anonymized data about the number of people that are within given 100m by 100m squares throughout the day. The data doesn't provide sentiment information directly - but it is possible to use mathematical properties such as the Entropy principle to infer this knowledge in a postprocessing step. The big advantage of Swisscom's data is that it covers, unlike the other data sources we identified, everyone who owns a mobile phone - all demographics and most age groups.
Future work: YouTube
YouTube can also be seen as a social network and is more widely used than all other social networks in the US. Thanks to the YouTube API and other tools, using this data for sentiment analysis is possible as well. However, this requires extensive preprocessing as the sentiment of each video must be analyzed, which is not a trivial task.
The architecture can be separated into three parts: Data-Mining, Backend and Frontend.
The datamining is done with Python. The Spotify charts are scraped from https://spotifycharts.com/regional. The Spotify API is accessed with the Python library 'spotipy'. The sentiment of the songs is determined according to their audio feature 'valence'. Song lyrics are obtained with the Musixmatch API. VADER sentiment, a high quality academic sentiment analysis library, is used to analyze if a songs lyrics and a Tweet is positively or negatively connotated. The prepared data is stored in databases on Microsoft Azure.
The backend server uses flask.
The interactive map on the frontend was built using d3.js.
Challenges we ran into
Acquiring high-quality and relevant data seemed difficult for us. The big dataset provided by EPFL and SRF with overall 2.6M tweets related to Corona wasn't specific to Switzerland. Therefore, we had to pivot to our own tweet extraction pipeline which was unfortunately limited to the constraints of Twitter. However, we were able to handle everything life threw at us by taking a deep breath and remembering the body movement exercise from migros at the opening ceremony.
Accomplishments that we're proud of
We propose the usage of different data sources for trend and sentiment analysis in times of social and economic uncertainty such as the Covid-19 crisis. We demonstrate the effectiveness this approach has for this purpose.
What's next for CHemotion?
Add more data sources to the pipeline - for example Swisscom and YouTube as discussed before.