Blockchain Health Data Sharing

Inspiration

What it does

For a long time, humanity has been fighting pandemics. The last one, COVID-19, is still ongoing. What can we do to avoid these pandemics? What can we do to prevent them? Humanity has the tools to fight against the spread of viruses. It is no coincidence that we use the word humanity: only if we think as an entire humanity and not as individual nations can we think of preventing pandemics. We think that being able to know in advance if an explosion of symptoms related to a viral infection is underway in an area of the planet can act as an alarm bell and make citizens safe. We have thought of a platform common to the whole world. This blockchain-based platform will have to collect health data, possibly from each individual, and write it, anonymously, to a database. This will not infringe on your privacy. These data will form your medical record. In particular, he will have to collect the symptoms of diseases, collect them in a huge database and then analyze them with AI tools. The platform works on several levels:

Doctor-patient level: A doctor will be able to access the platform to fill in a form every time he visits a patient. It will insert every symptom that the patient finds and will insert the medical prescription on the platform.
Patient level: the patient also accesses the platform via smartphone, has the visibility of his / her medical record, recipes, will be able to reserve medical examinations and take advantage of many other services that technology and smart contracts will enable. You can activate Smart contracts to enable other medical users to manage your clinical data, even temporarily, for the duration of the visit only.
Healthcare institution level: a local, national or supranational healthcare organization will be able to access Bigdata in order to have real-time symptoms. The AI will be able to predict anomalous situations and warn the health authorities that they will have to take all measures to avoid the spread of the epidemic. These BigData will be useful to do basic research, improve drug therapies, find new diseases, improve the doctor-patient relationship.

This platform will become the world standard for fighting new diseases.

How I built it

Platform

The proposed scheme represents a platform that in a completely interoperable mode - privacy aware by design - is able to provide for the ingestion of heterogeneous, varied and voluminous data, coming from different subjects operating in the territory and that is able to return them - appropriately structured - to the various stakeholders, according to the needs and visibility rights of the stakeholders. The focus of the project is on the platform and not on the application silos, although the silos are of extreme importance for the project since they will provide innovative services, with high added value for the citizen and will represent proof of concept for piloting in order to demonstrate the consistency and consistency of the platform itself. Another value that the platform allows to provide is the federability of the information contained in it with other platforms, albeit keeping intact the criteria of confidentiality and inviolability of the data.

(/Platform Scheme.png)

The platform bases the integration of the application components on a hub (message bus) which decouples the application responsibilities and still provides an aggregation point with respect to the persistence of the information. In this way it is also possible to orchestrate services in a macro territorial logic. Alongside the coordination platform are the IoT and B.I. that communicate with the Message Bus in order to use its authentication and authorization rules, while communicating with data storage in "non-mediated" form by the BSE for reasons of efficiency, in order to avoid bottlenecks. Access to the cluster that contains sensitive information is mandatory through an application layer that assesses the conditions and exploitation rights from time to time. An important aspect in terms of innovation in data management is the idea of introducing a blockchain cluster for the management of sensitive data. The methods envisaged for the use of this technology are illustrated in the next chapter; as for the purposes, a distributed framework for the digital identities of patients, which uses private and public keys to protect them in a cryptographic way, creates an innovative and secure method to protect the patient's identity. In addition to identity, it is conceivable - during the project - to ensure through the blockchain also the entire supply chain of services provided to the patient, recording the steps through a natively tamper-proof technology, in order to provide maximum transparency on the work of the public and private health services that will be involved on the territorial platform.

An important aspect concerns the fact that the platform is designed to be naturally extendable to other services and related information. This quality is fundamental to guarantee a subsequent industrial exploitation of the project on the regional territory and its further possible portability in other regional contexts through reuse.

Organization of data

The data processed by the platform are divided into various types, and are characterized by two attributes:

Sensitive / non-sensitive
Structured / Unstructured

Sensitive data are first of all the patient's personal data, extended with those data that could allow identification (e.g. complete address). Non-sensitive data are actually sensitive data (e.g. clinical parameters, diagnostic reports, ...) which are anonymized by hiding the patient's identity (and, as above, all data suitable for his identification or assignment in a restricted group) which it is replaced by an "opaque pointer" to the patient registry. The term "opaque pointer" means an ID assigned by the system, which allows to perform searches on aggregated data, while safeguarding patient privacy. One of the research areas of the project concerns the assurance of the confidentiality of information through the experimentation of blockchain-type technologies, or more precisely of permissioned blockchain as regards sensitive data. Permitted blockchains differ from traditional blockchains in that the computational nodes used for the validation of transactions are certified and therefore, even if scattered throughout the network, they are managed by trusted and guaranteed partners. The use of a blockchain can lead to several advantages over traditional encrypted data storage technology. In particular, the use of a distributed and federated database allows to avoid the control of data by a single organization and their improper use. The availability of a distributed chain, which records all the operations carried out on the data (and in some cases prevents them) allows to avoid its fraudulent use. For example, voluntary data alterations are obviously possible, but the blockchain keeps track of the change and since it is not controllable by the individual person / organization, it is not possible to prevent the change from being recorded by the other components of the system. As for the type of data, we can divide it into structured and unstructured. The former will have the classic relational format, and will therefore be stored in one or more instances of RDBMS. On this side we can expect a good amount of initial data followed by less numerous variations, for which the traditional RDBMS technology appears adequate. The personal data (of doctors, patients, clinics, ...) the results of diagnostic checks, screening, various analyzes, and of treatments carried out (hospitalizations, surgical operations, ...) and health files belong to this type of data. On the other hand, unstructured data belongs to those collected mainly by sensors, generally in a repeated and continuous manner over time, such as heart rate. These data are recorded by the IOT platform and transmitted to an unstructured storage system, which could be a NOSQL database or, in case of even higher volume, a Big Data platform. Each sensor uploads the data anonymously (sensor id + patient id + medical data). These can be traced back to the individual patient by authorized users, for example the family doctor (who using the blockchain platform is authorized to perform the sensor-patient coupling) or used for aggregated searches by personnel authorized to use BI The platform will also contain business intelligence data, typically in the form of a star database, or denormalized database obtained by joining the data contained in the other databases. In the case of BI, queries are defined on a case-by-case basis, so authorizations can be granted on demand to individual statistics The same protocol can be used as a basis to implement other features based on identity sharing, first of all the Single Sign-on (SSO) among the various applications.

Security and privacy Authentication

The identity of the users will be guaranteed by an authentication gateway, which will maintain a uniform interface towards a plurality of authentication tools that can include, in a non-exhaustive way:

SPID
An LDAP database owned by the Body
A proprietary database

Successful authentication can provide, in addition to the demonstration of recognition, a series of tokens that prove the role of the user with respect to the data stored in the system (e.g. family doctor / terminal operator ASL / specialist / ASL official / employer ) The individual applications will then be able to perform internal user mappings to perform private business operations - i.e. local operations, not obtained by combining data and functionality from multiple applications - or to maintain compatibility with legacy databases and difficult to adapt to the distributed context. Authorization The authorizations will be based mainly on the tokens owned by the user, for example

A doctor will be able to see the data of his clients
A Clinical terminal operator will be able to view patient records and booked exams but not clinical data or exam results
A Health Authorities operator will be able to see the aggregated data anonymously (eg: overall result of a screening)
A specialist consulted by a patient may request access to the data concerning him for the time necessary to perform the service
An employer will be able to verify the existence of a disease certification for an employee and its duration, but will not have access to the diagnosis

We can consider two main types of authorizations: role-based and delegation-based. Role-based authorizations are always present in the system, and are equivalent to universal and difficult to modify rules (e.g. a family doctor can always access the clinical data of his patients) Authorizations based on delegation are instead additions to the basic rules, have a transitory duration and require an explicit authorization (delegation) by the owner of the information accessed. An example would be a clinic that accesses some patient data to perform a service: access must be authorized by the patient and limited to the data important for the care and the time necessary to perform it. Role-based authorizations will be managed on the basis of tokens obtained at the time of authentication, or obtained through subsequent authentication at the system. In this way it can be reasonably assumed that most of the data access can be authorized locally, without excessive computational costs for the platform. As examples of this type of authorization we can indicate:

A family doctor accesses the clinical data of his own patient
A terminal operator, to enter a reservation, queries the patient registry of the system, but will naturally not be able to see the outcome of the examination carried out.
An Health Authorities operator requires statistics on the provision of certain services, but only anonymously and in aggregate form (he cannot know, for example, who performed them)

In this case, the authorization rules will be contained in the (sub) program running and authorize based on the identity of the applicant and the tokens held Authorizations based on delegation will instead be conveyed through the Oauth2 protocol or some derivative thereof. In this type of authorization, the user temporarily impersonates another (the one who "possesses" the data materially) and it is important that the credentials of the impersonated user remain hidden from those who obtain the delegation. The Oauth2 protocol serves precisely to allow temporary access to resources controlled by another user, after obtaining his authorization but without having to reveal his password. At this point, a brief digression on the protocol is appropriate.

The Oauth2 protocol

Ald an Oauth2 transaction is attended by four distinct actors:

The user who owns the resource to be accessed
The client, or the application that must access the resource impersonating the user
The authorization server, which grants authorization after examining user credentials, issuing an authorization token
The server that contains the protected resource, which allows access after validating the token. Let's see the sequence diagram of the protocol:

(/OaUTH2 scheme.png)

The user makes a request to the client who needs to access the resource.
This responds with a redirect to the authorization system.
The authentication system presents the user with a login page.
The user logs in and decides which data to allow to the client (granularity can be set as desired)
The user confirms his availability to the authorization server
The authorization server identifies the user
The authorization server sends the user a redirect to the client, containing an authorization code generated for the occasion
A client callback is called
The client, using the authorization code, calls the authorization server which returns a resource access token to it
The client calls the Resource Server and presents it with the access token
The resource server checks the token. The phase is not shown in the diagram because it depends on the implementation
The resource server provides the requested resource
The client receives the final page The significant thing is that in the end the client accesses a resource owned by the user, receiving authorization but without ever seeing the credentials of the latter.

Examples of use of this protocol could be:

A patient who authorizes access to their data (or part of it) for a limited period, for example in case of hospitalization;
A family doctor who authorizes a colleague to access the data of his patients in the limited period in which he replaces him (for holidays or illness) and limited to patients who have particular urgencies and cannot wait;

In both cases, the protocol allows you to authorize access to the requested data without revealing the user's credentials and without granting access to unsolicited data.

Privacy and data retention

Privacy management is of primary importance for the system. Application privacy can be guaranteed by the mechanisms inserted directly into the program (user X can see patient Y data), but there are also possibilities for direct access to data without going through the application. For example, the administrator of the hardware and software system, or the Database Administrator may have, by choice or "forgetfulness", rather high access permits that allow direct interrogation of data without going through the application. Considering the area covered, it is also possible that unauthorized access to data is attempted, or that someone tries to modify it without authorization. Here the need to clearly separate the application data from the users database, in such a way that the confidential data:

They are accessible without too many limitations to carry out aggregate and anonymous Business Intelligence studies
They are accessible for monitoring and Complex Event Processing activities (for example, to monitor the data sent by a sensor in real time and anonymously and report any critical situations by identifying the patient involved at this point)
They are accessible to authorized personnel, complete with personal data
If consulted by unauthorized personnel (for software errors or for hacking the database or devices) they are viewed only anonymously, without knowing the name of the patient to whom they belong
They cannot be modified in a fraudulent manner (or at least a non-erasable trace of the modification made remains. Traditional technology does not allow to satisfy all these requirements at the same time, because privileged access to the centralized database allows to bypass the specifications just described. The first step is to make sensitive data "opaque". These will then be stored in the formats described above (RDBMS or NOSQL DB, depending on the nature) but the connection with the patient (or in general with the owner of the data) will be deleted and replaced by a reference with respect to a central archive whose access will be checked. In this second archive, data will be stored whose disclosure / identification would make the data processed sensitive, for example patient registry, address, ... It is clear that if the latter archive was a "normal" RDBMS, the same problems of trusting towards particular users and non-traceability of the changes would remain. Hence the need to introduce a distributed technology for accessing the users registry, a technology in which the concept of a single data custodian does not exist, a technology in which data is strongly protected cryptographically, in which it is not It is possible to modify the data without the consent of all the instances of the database, and in which the instances are kept by certified and reliable entities. Technology that can be identified with that of permissioned blockchains. This particular type of blockchain requires first of all that trusted nodes are defined who are responsible for participating in the consensus mechanism of the blockchain network and therefore validating the transactions. The consensus mechanism represents one of the most important aspects to ensure that a distributed system is fault-tolerance and as regards permissioned blockchains we tend to use, unlike permissionless ones, protocols that do not require great computing power. This type of algorithm allows to avoid excessive overloads (and therefore costs) of the validating nodes as the number of operations increases, also allowing to obtain a greater number of transactions per unit of time. The permissioned blockchains also allow you to define access rules for reading and writing operations, guaranteeing the possibility of complex management that can be used for the management of sensitive data.

IoT Platform

The system provides an IOT platform, on which dedicated sensors will be inserted. These can be mainly of two types:

Remote health control or RPM (Remote Patient Monitoring). These sensors perform a check of the patient's health status outside the dedicated structures, typically within the home. This group includes sensors for the control of symptoms related to chronic diseases such as pressure, blood sugar, heart rate or quality-of-sleep controls, night apneas and the like.
Remote notification systems that provide for the discovery of dangerous symptoms immediately through tools that, analyzing the series sent by the sensors, recognize patterns of potential danger, such as heart attack or falls in the home of the elderly In both cases the sensors will interface with an IOT platform. By this we mean a server that exposes an interface, typically HTTP, to the sensors. The purpose is to collect any data sent, possibly submit it to Complex Event Processing procedures for the discovery of significant patterns, and store it. Considering the amount of data sent on average by the sensors, it is advisable to provide separate storage from that of traditional RDBMS instruments. A NOSQL or Big Data platform will therefore be prepared (it will depend on the volume of data sent) with which Business Intelligence systems and application pillars will interface. Eventually only the remote notification systems will communicate directly with the system bus, having their data more critical. The general structure of the platform will be as follows

(/Gateway scheme.png)

A heterogeneous set of sensors will communicate with a Gateway (the IOT platform) which will channel the data towards the actual network. This will store them and make them usable through a variety of devices that will go from the smartphone (to allow the patient to manage their sensor and examine their warnings), BI systems and family doctors who can always check the health status of own patients