Everyone is fighting Corona in someways so I was looking into the major issues that we are facing and came up with this thought to help the community

Finding Solutions to Fight Corona Problems:

  • Too much data coming in twitter stream
  • Categorizing the data:
    • Corona Positive Tweets(Recovery cases, New Policy to avoid Corona, Trials Success) vs Negative Tweets(New Corona Case Reported, More patients dead, Spread to new area)
    • Corona Supply Request(We need to identify supply shortages in area)
    • Corona Notice(Major announcements)
  • Country based categorization:

    • Will have further drilled down with following:
      • Death
      • Recovery
      • New Case
      • Re occurrence

What it does

Currently our platform process twitter data and aggregate that to several categories and using a web platform to show the data to the entire world.

How we built it

  • Started building a PySpark Streaming app which will do the categorization of tweets
  • All the processed data will be saved on to MongoDB
  • Visualization of the Data on MongoDB using a web platform

Web Platform Technologies considered:

  • Django based API Implementation
  • React based FE.

Challenges we ran into

Hosting it in AWS and connecting everything together.

Accomplishments that we're proud of

  • We got our platform up and running from idea to working platform on AWS server on the 5th day
  • We are processing data in near real time

What we learned

  • Need to look for concurrent user requests
  • Caching must be implemented

What's next for Corona Data Analysis Platform

  • Implement user request handling part
  • Build visualization on the data that we have collected so far
  • Manage the large data that we're collecting
  • Country based categorization

Built With

Share this project: