Inspiration
Everyone is fighting Corona in someways so I was looking into the major issues that we are facing and came up with this thought to help the community
Finding Solutions to Fight Corona Problems:
- Too much data coming in twitter stream
- Categorizing the data:
- Corona Positive Tweets(Recovery cases, New Policy to avoid Corona, Trials Success) vs Negative Tweets(New Corona Case Reported, More patients dead, Spread to new area)
- Corona Supply Request(We need to identify supply shortages in area)
- Corona Notice(Major announcements)
Country based categorization:
- Will have further drilled down with following:
- Death
- Recovery
- New Case
- Re occurrence
- Will have further drilled down with following:
What it does
Currently our platform process twitter data and aggregate that to several categories and using a web platform to show the data to the entire world.
How we built it
- Started building a PySpark Streaming app which will do the categorization of tweets
- All the processed data will be saved on to MongoDB
- Visualization of the Data on MongoDB using a web platform
Web Platform Technologies considered:
- Django based API Implementation
- React based FE.
Challenges we ran into
Hosting it in AWS and connecting everything together.
Accomplishments that we're proud of
- We got our platform up and running from idea to working platform on AWS server on the 5th day
- We are processing data in near real time
What we learned
- Need to look for concurrent user requests
- Caching must be implemented
What's next for Corona Data Analysis Platform
- Implement user request handling part
- Build visualization on the data that we have collected so far
- Manage the large data that we're collecting
- Country based categorization
Log in or sign up for Devpost to join the conversation.