Presently there has been a large panic among people regarding COVID-19 and there has been a huge problem with the treatment of a large number of people in hospitals. Some of the patients tend to show early signs and go voluntarily to get the covid test done but it leads to a huge number of people visiting hospitals wherein the hospital's capacity is not able to accommodate the number of people visiting it. In order to streamline the process of delivering healthcare, we propose a method that identifies the risk for a person in terms of having COVID-19 based on their demographics.
What it does
We developed an accurate machine learning model that can be used to predict the level of risk of a person based on age, race, gender, date of exposure, and pre-existing illness to inform them whether to go for a checkup to the doctor or not and not panic in case any primary symptoms are shown.
Model details C5.0 machine learning model 2 fold cross-validation trained over COVID-19_Case_Surveillance_Public_Use_Data
- Accuracy: 90.60%
How I built it
- Checks for a risk factor for a particular user of having specific age, gender, race, and medical condition
- 4 levels of risk factor are given:
- 0 - No risk (shouldn't visit a doctor),
- 1 - minimal risk,
- 2 - moderate risk (should plan on visiting doctor within a week if illness pertains),
3 - High Risk (Should visit a doctor immediately)
- Based on the risk factor, the user can either visit the doctor or stay at home to avoid infections. We also identified the attributes that were mostly used for making a prediction. In other terms, the risk of having COVID-19 is dependent primarily on the following factors:
Attribute Usage for making prediction
- 100.00% medcond_yn
- 100.00% age_group80+ Years
- 94.78% age_group70 - 79 Years -12.73% cdc_report_dt
- 9.75% age_group60 - 69 Years
- 6.52% age_group50 - 59 Years
- 4.72% Race.and.ethnicity..combined.Asian, Non-Hispanic
- 4.47% age_group40 - 49 Years
- 3.49% Race.and.ethnicity..combined.Unknown
- 2.99% sexMale
- 1.95% Race.and.ethnicity..combined.Multiple/Other, Non-Hispanic
- 1.31% Race.and.ethnicity..combined.White, Non-Hispanic
- 0.77% Race.and.ethnicity..combined.Black, Non-Hispanic
- 0.64% age_group20 - 29 Years
- 0.30% age_group10 - 19 Years
- 0.27% age_group30 - 39 Years
- 0.12% Race.and.ethnicity..combined.Hispanic/Latino
Challenges I ran into
Development of the UI using the R script was a big challenge but with the help of the mentor we were able to identify that Shinyapp is an effective way to create a UI for the same and we were able to make the same very easily.
Accomplishments that I'm proud of
Achieving a high accuracy for prediction for our present c50 model which was fitted to over 400,000+ user data obtained from https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf
What I learned
Development of machine learning models as well as the architecture of tree-based machine learning models.
What's next for Coris
Adding a model ensemble for another tree-based classification model fitted over the user's location data so that the user's location data could also be used to create more accurate predictions for the risk factors. Moreover, updating the UI so that it is more fancy.