Reliable, anonymous auto-alert system for COVID19 exposure

Inspiration

Let's beat this pandemic together, and prevent future pandemics from happening!

There is a need for a tracking system that is anonymous and reliable. Widespread adoption of the system is critical in providing relevant information, so the users must be at ease that their identities are not compromised with the application they are using. Furthermore, the source of data will be as direct as possible, since the system aims to obtain this information from the hospital/testing center itself, without adding a heavy burden on their staff.

For the current pandemic we are facing, this system can be useful in preventing a second wave of infections among many countries, especially now that some areas are thinking to open communities and businesses again in the coming weeks.

What it does

This proof-of-concept system will allow people to get alerted if they have been possibly exposed to a confirmed infected person.

To make the system more reliable and avoid tampering of data, the system will have 2 applications, or a single application with 2 modes. One will reside in the users’ personal device, while the other one will be on the hospital or testing center.
- For simplicity and ease of installation, it is preferred that the device used on the hospital or testing center will be mobile with a camera. The camera will be used to scan a QR Code. A printer may also be needed to print the QR Code, unless there is another way to map/relate the user’s generated UUID to the testing kit.
- To avoid overloading medical personnel of additional tasks, scanning the UUID as a QR Code will be used, instead of manual input. This will also eliminate the chance of input errors.
- After a user is confirmed positive for the infection, the application on the hospital/testing center will use the infected user’s QR code as identifier to send the encrypted and anonymous historical geolocation data (consisting of date/time and location only) to a cluster server.
- Web services on the cluster server will find the intersection of the infected user with all users in the system. This search can be optimized using well-known techniques so the search will not take too long on potentially millions or billions of users.
- Users who were in the same area as the infected person will be alerted on their personal device. A publish-subscribe method can be used so that devices won’t need to poll the server multiple times. This can also avoid the server from knowing the users phone number, since this will not be the channel used to inform each user.
Aside from maintaining a record of anonymous infected persons, this system may also be designed to be extensible so that statistics can be gathered on the number of persons who recovered from the disease, number of persons who got reinfected and when, include PUI (Persons Under Investigation) for COVID19, etc. The system can be extended by adding an extra field that will indicate the type of person (e.g. Infected, PUI, recovered, reinfected) in the database.
This system can be used for other types of data analytics and mapping, but only with limited information since name and exact place of residence MUST NEVER be included or used as input by design. To clarify the previous statement, only the exact date, time range and location where a confirmed infected person has been to will be sent for data analytics/mapping. To prevent spying/extrapolation of data where an infected person lives and avoiding potential harassment, the data available to the larger public should be able to abstract this information. For example, here are some use cases:

a. to create a map that displays a trail of infections, only date, time range and location will be returned by the API. API keys may be used so that only reputable/approved analytics projects can be given access to the data.

b. to create a map that displays statistics on the number of currently infected people (and recovered cases), only location and type (Infected or Recovered) will be returned by the API. API keys may be used so that only reputable/approved analytics projects can be given access to the data.

c. to be able to alert users of whether they have been possibly exposed to the disease, not the entire trail of the infected person will be shared, i.e. only the date, time (can be further refined, please refer to the presentation) and place where (Infected Person, user A) have been in common will be given to user A. In case (Infected Person, user A) share many different points of intersection, data can further be protected by randomly selecting only one common point (date/time) they share and displaying this to user A only.

How I built it

Still a proof-of-concept I've been thinking about for the past few weeks

Challenges I ran into

I ran into this hackathon just a few hours ago, so there is not much time to prepare, but I have been thinking about the idea for a few weeks now. I live in a city that is densely packed, there is currently limited testing (though increasing in the past few days), and no lock down in spite the increase in cases, so people can feel very insecure at times.
This system cannot work without the explicit consent of users. They must be assured that the data will only be used for alerting, and that only anonymous and encrypted data will be sent.
This system may only give as accurate an information by the API used for geolocation. Battery drain may also pose a challenge, so updating geolocation data for users may be limited to 1-2 times per day. This may be further refined so that once an infected user is confirmed, it is advised that his/her geolocation data must be sent immediately.
This system can only be effective in urban areas where Internet/GPS services are common. This system may be extended to inputting manual data and/or making use of offline data, but this will make the system less reliable for those users.

Possible challenges for the developers

This system will make use of data encryption, and needs an algorithm that optimizes how to find the "intersection" an infected person has with potentially millions of users, but with efficient filtering by location and date/time, the number of comparisons can be minimized. Selecting the threshold for defining how many points (e.g. X=location, Y=date/time) will be defined for comparison and storage can also be a challenge for millions of users. Additionally, UUIDs in databases can pose some performance issues, so extra design decisions should be enforced. Other cryptographic methods may be used instead of the UUID-QR Code combination, but this will be out of the scope of this hackathon.

What I learned

In times of crisis, good and brave people always step up and volunteer. Knowing this, I wish to be able to create a crowd-sourced type of system that is optional and gives transparent information to people, yet does not compromise data privacy. The alert will be delivered directly to users, eliminating potential sources of delay, and making it more confidential for the users.