Due to Covid19, various social distancing measures like (i) Travel bans & (ii) Work-From-Home (WFH) policies were adopted. The forced quarantines moved people out of public spaces and a lot of conversation moved online to social networks like Twitter, Facebook groups, Reddit channels & messaging platforms. We will use this global opportunity to visualize this discourse
What it does
Thanks to Twitter Developer Labs & tweet IDs shared by JMIR team (please check Resources slide) we had access to millions of tweets from the covid19 streaming endpoint These tweet IDs were hydrated and analyzed to visualize the changes in social networks & identify influential tweets
How we built it
The tweet IDs were hydrated using open source tools & passed to AWS Kinesis data-stream. From there using spark streaming we did basic exploratory analysis of more than 35 mn tweets (this is a sample from months of Jan & May) Why these two months: January was the month when the epidemic was gathering steam. By the month of May it had spread worldwide.
Challenges we ran into
- Analyzing large scale unstructured datasets (given limited resources)
- Anonymization of tweets
- Challenges in modeling via availability of suitable labeled data (for tackling mis-information)
Accomplishments that we're proud of
We were able to achieve the task of identifying the change in topics over time. And visualize the changes in human pysche as fear gripped the general populace. Please check our network analysis slides in the shared presentation
What we learned
There was a large jump in tweets as the epidemic became a global phenomenon. The spread of the epidemic also saw the rise in the number of mis-information being spread through various channels.
What's next for Tackling Mis-information during the Covid19 Pandemic
We want to continue working to analyze how mis-information was being spread. We also want to work further on creating a framework for anonymizing of social media tweets.
Note on related teams
We are part of CoronaWhy.org, it's global community of volunteers from diverse backgrounds. We have come together to find solutions to problems raised by the pandemic. As such we had 5 members of our community who had volunteered for this project. We have split ourselves into two teams. Names of the team members are :
Team 1: Data extraction, pipelines & network visualization
- Aakash Gupta
- Nithin Krishna KS
- Ali Haider Bangash
Team 2: Basic EDA & classification of tweets spreading mis-information
- Pranjalya Tiwari
- Li Xueqi