COVID-19 Self-reporting with Privacy
Contact-tracing is an important tool in fighting the spread of viruses. While many contact-tracing applications exist, they face two problems:
- Data Silos
SafeTrace is an API which connects to a privacy-preserving storage and computation service.
This means many different applications can submit encrypted data via the SafeTrace API, and receive their results, without ever revealing plaintext data to anyone (including the SafeTrace server operator). This relies on Trusted Execution Environments (TEE), a technology for preserving data privacy while data is in-use.
What we built
- a python script (thanks to @shenrene) that cleans raw google takeout data for our api endpoints
- defined an API for the private compute service (storage and compute)
- successfully executed Rust code within an enclave, and started on the specific compute tasks for SafeTrace
- built a complete wireframe UI for the prototype client app
- defined the architecture and requirements for a user ID management server
- scaffolding for a prototype client web app that users can use to add their Location History via Google Takeout (thanks to @nutan -- can be viewed and demoed here: https://github.com/nutanp/SafeTrace/tree/master/UIcode )
Working with the TEE enclave is difficult and takes time, but we are making progress.
This service has moving parts, and we weren't able to get to an end-to-end MVP during the course of this hackathon. That is our next goal however, and we are tracking progress towards it here:
We're looking for help, feel free to ping us if you want to join our team! Below is a list of areas that we need help with and our open questions
- full-stack web app development
- Rust programmers, developers and engineers with Intel SGX experience
- back-end eng. who can work on notification and api points
Overview & Motivation
Social contact tracing based on mobile phone data has been used to track and mitigate the spread of COVID-19. However, this is a significant privacy risk, and sharing these data may disproportionately affect at-risk populations, who could be subject to discrimination and targeting. In certain countries, obtaining this data en masse is not legally viable.
We propose a privacy-preserving, voluntary self-reporting system for sharing detailed location data amongst individuals and organizations. Users will be able to encrypt and share complete location history, and their current status (positive, negative, unknown). Users will be able to update their status if it changes. This system will compute on shared, aggregate data and return location-based social contact analytics.
This system relies on 3 core services:
Location History data from Google Location Services via Google Takeout
Any user who has Location Services active with Google is able to obtain a JSON format file of their location history. They are also able to edit this file manually to remove any unwanted or sensitive locations (i.e., a home address). A user who does not use Location Services can manually add a history via Google.
Note: This service could be swapped/replaced by a mobile application at some point
A Privacy-preserving Computation service
Private computation is a term for performing tasks on data that is never viewed in plaintext. Our system will use private computation to generate individual and global analytics. In this scenario, private computation techniques could be employed to:
- Identify users who have been in close proximity with individuals who have tested positive
- Add noise to user locations, and then output that data to a map without revealing the original data to anyone, including application developers or server owners
- Analyze and create clusters from user data, and output those results to a map without revealing original data to anyone TBD (we welcome suggestions for computational analysis that provides privacy guarantees as well as useful, high-fidelity output data)
- Initially, we propose using an Intel-SGX based service that uses Trusted Execution Environments (TEE). Additional alternative private compute techniques include homomorphic encryption, multiparty computation, and differential privacy.
Visualization and notification services
Our working assumption is to:
- Inform individuals who have been in close proximity of individuals who have tested positive via a notification system. This section is TBD based on requirements defined by experts
- Create a visualization service for users (individual and social organizations) to track the current status virus outbreak at a granular level.
These diagrams provide an overview of how these services connect and how data is accessed and controlled throughout. Note: data is encrypted on the client side, remains encrypted in transit, and is protected by TEE security and privacy guarantees during compute.
- User creates an account (email and password)
- User views instructions for retrieving location data from Google Location services.
- User reviews Google Maps timeline, and optionally removes any sensitive activity (i.e., home address, work address, others)
- User exports her data via Google Takeout service
- User returns to app UI and uploads JSON file from Google Takeout for the previous month or two
- User indicates her current testing status (positive, negative, untested) and the date of the test (today's date if untested)
- User submits data to compute service (data is encrypted locally by the app prior to sending)
- User can now view "matches", where her data overlaps in time and proximity to a user reporting a positive test result
- User can opt in to receive emails if new matches occur, and prompting her to update her data and infection status periodically.
The system is made up from the following components:
- contains the self-reporting UI
- displays the individual proximity match report from post-compute results
- displays a heat map view of positively tested participants (global results) from post-compute results
Login / Unique identifier DB
Private Compute Service
- contains code
- maintains an encrypted DB of submissions
Data self-reporting UI
- Clearly communicates to users the goals and possible risks of the service
- Walks users through obtaining and sanitizing Google Takeout location data
- Provides https-like assurances that UI is in communication with successfully attested enclave
- Enables users to create a persistent email/password log-in
- Enables users to submit, and update:
- 1-2 months of location history in Google Takeout JSON format
- Current infection status (positive, negative, untested)
- Date test was administered
- Runs data formatting and simple data validation on the browser
- Proves what code is being executed over the data
- Proves integrity via Intel Attestation Service (IAS)
Input: Encrypted user location histories in Google Takeout JSON format
- Positive matches between users who have had positive test results and users who overlapped with them on time and proximity for individual reporting
- Clustering algorithm to run on location history of users who have had positive test results (with time dependend weights) for global view
Current thinking is to have two services result from the computation:
- A notification service for users who are untested/negative that tells them if they have overlapped in time/proximity with positive test cases [Link to detailed description]
- An aggregate heatmap of locations where individuals with positive tests have been [Link to detailed description]
The code in this repository is released under the MIT License.