Inspiration

The medical sector is growing on a huge scale day-by-day. So, is the data present in this sector. And management of this prestigious data; is something neglected right from the start. To overcome this hurdle is the main motive of this project.

Abstract

This project aims to model and develop a Data Lake (repository that contains structured, semi-structured, and unstructured data) for data from patients in the healthcare system. This Data Lake contains both relational databases and unstructured data (the result of image exams) and semi-structured data (XML files that are exported by equipment) that make up the patient's history. The planning is for this Data Lake to be accessed via mobile application and the Web. The Data Lake will be powered by both doctors, laboratories, and the patient himself. Doctors will be able to access the patient's entire life in a clean, safe, and fast way. Thus, if the patient changes doctors, all of their information remains available in an integrated manner. Doctors will be able to exchange vital patient information with each other via the system. An example would be a patient who finds out he has diabetes and needs his nutritionist to reevaluate his diet. Within minutes the patient would have access to a new diet sent by his nutritionist. In an emergency, the patient, even unconscious, could have his data released via the doctor's ID, taking responsibility for access in an extraordinary way. If the patient so wished, he could obtain prognoses about probable diseases that he could develop based on analysis of the data of his health history through the application of Artificial Intelligence techniques. In addition to the clear advantages for the patient, the government would have a much more reliable and secure mass of data to carry out its analyzes for the public health policies adopted. In cases of emergencies such as pandemics, the government could quickly and safely know the groups of risks that would have to be insured and their current situation.

Introduction

Nowadays, when patients schedule a medical appointment and need to submit their exams, they are required to carry a large volume of printed material. In addition to the potential inconvenience of having to carry this documentation, a lot of information can be lost. Countless patients have their vaccination cards, exams, and the record of several other procedures performed throughout their lives lost due to the action of the weather, rain, or loss of them. When migrating from the private network to the public network, or vice versa, your health history may no longer be accessed, being inaccessible. Also, most health information is not digitized. All of these circumstances cause a loss of important information throughout the patient's life. With advances in technology, storing patient data has become a necessary and fundamental action for healthcare professionals. In the last few years, the data has proved to be the biggest commodity in the industry, in the academy, and the health area. With the increasing increase in data production and the rapid development of Big Data technologies, cloud computing, and machine learning, the value of data has been identified in several domains, especially in the health area. However, there are still barriers to the dissemination of scientific data among the various researchers, such as proprietary, heterogeneous formats, access permissions, etc. This isolation of data creates data silos. Non-integrated data storage can cause healthcare professionals to miss out on research and development opportunities. Also, this information must comply with the new legislation, which will start to take effect soon. However, patient data is heterogeneous, voluminous, and confidential. Several technologies can support this sharing, eg, Data Warehouses (DW), which are multidimensional databases used for data integration. However, DWs store only structured data (many scientific data are semi-structured or unstructured), their modeling can be complex and time-consuming and require a high development time. More recently, the Data Lake concept has emerged as an approach to integrating structured, semi-structured, and unstructured data. A Data Lake is an approach that consists of a data repository associated with an engine for processing queries and data. The great advantage of a Data Lake is that it does not require prior modeling and is capable of storing data in its raw format (not necessarily in a structured way), preserving the principle of immutability. However, to be able to store and query data in different formats, Data Lake must have a series of metadata to facilitate the location of the data and its subsequent analysis. There are several solutions for Data Lakes on the market, with the Hadoop stack solutions being the most used (https://www.searchtechnologies.com/blog/search-data-lake-with-bigdata). However, these solutions have several limiting factors when dealing with health data. Many healthcare professionals lack computer expertise and dealing with technologies like HDFS, Hadoop and Spark may not be trivial. Besides, source data (history of creation, alteration, and deletion of data) must be captured, as well as data privacy issues. Healthcare professionals must be able to query and view the data intuitively, in addition to being able to generate new data for Data Lake after applying certain processing (eg, a machine learning algorithm). Thus, the present project proposes the development of a system that supports the management of Data Lakes for health data. With the system, doctors can: (i) import data from multiple sources into Data Lake through an intuitive interface, (ii) associate source data with imported data (say the source of a particular exam), (iii) apply anonymizations to the imported data, (iv) consult data in different formats and (v) feed Data Lake with data from processing within the Data Lake environment itself. In this way, the government administration would have more control over the data of its population and could establish health policies in a less uncertain way, better control the health system, better plan, and invest resources, in addition to better monitoring patients.

Proposal Relevance

From the government's point of view, this project aims to meet the demands of the most vulnerable population in relation to public health needs in order to maximize hospital care and subsidize decisions aimed at better targeting public health policies. The government will be able to better target medical specialties according to the demands of the population. For example, identifying that a community with a majority of elderly people does not offer support from a geriatrician. Logistics can be maximized by sending supplies to places that need them, such as a greater amount of flu vaccines where there is a higher incidence of this disease. Health studies may use Artificial Intelligence models to discover where there is a greater possibility of such diseases according to the health situation of a given region. In the event of an emergency such as an outbreak or epidemic, the government could quickly learn and take the necessary measures in relation to the vulnerable population. In the case of using Telehealth, these more reliable and fast data are essential for the doctor, at a distance, to have knowledge of the patient's health history and perform safe care. With this, the government can assist patients using specialists at a distance and at a low cost. Another problematic point is the repetition of the same procedure by two or more doctors in a short period. With this system, this problem could be minimized because if the doctor makes a repetitive order, the system will charge and the doctor will be notified. In this case, the doctor will be able to confirm the request if necessary or view the last same procedures that were performed. For the government to have this mass of data updated and secure would be fundamental to its public policy. From the patient's point of view, he will be seen more quickly at the health points with his data being made available to the reception and to the doctor who, even before the consultation, will have access to his updated health history for a better analysis of his case. The health system's monitoring of the patient's life and needs will be safer and more reliable. The patient and the doctor will be able to access the results of procedures on their cell phones or computers without having to search the places where they were made. This would make the patient have a quick medical response in the event of an emergency. It would also be possible for doctors to form a group at a given moment and be able to exchange information about the patient if necessary. For example, a patient discovers that he has diabetes and at the same time his nutritionist receives the exam sent by his doctor and already updates his diet in minutes. If the patient arrives for care in an emergency situation, without being conscious, the doctor who attends him will be able to access his data using a standard password linked to his ID. After the appointment, the patient or his family will be able to confirm or not the state of emergency. The patient will be able to interact with his data with Artificial Intelligence models to generate possible diagnoses of diseases that can be developed during his life. From a medical point of view, patient care will be safer and faster. In emergencies, decisions can be made based on more reliable and secure data. All procedures can be accessed at any time, making service faster, safer, and more reliable. It would also increase the number of patients seen thereby speeding up the waiting line. In the use of e-Health, the doctor would have complete safety and reliability in treating a patient even at a distance with more accurate and updated data. To finish all these ferments, they would make it possible to dry the public machinery as well as its flexibility and dynamism. Every public health policy would be guided by much more accurate and current data. The government would have the public health situation in real-time and could even estimate future data. This could be used in the form of Big Data by joining the population's health data with others such as education and safety, among others. With this, the government will be able to access the most important public information and generate statistics and projections for its population.

Objective to be executed

Computer systems based on patient information need precision, which is not the current reality. The storage services in the health area, in most cases, when they exist, are not reliable. The big problem with electronically storing medical files is the standard that each healthcare professional uses. In view of these problems, we propose to develop a way to store patient information, quickly and safely, without modifying the original data regarding its format. The project aims to create a System based on the Data Lake concept that can be accessed by cell phone application and web-based systems. Data Lake would be powered by the patient himself, doctors, and the public health system. The patient could delegate this function to someone they trust if they have any limitations. In the case of data feeding by the patient, he inserts each procedure done through the cell phone or if you prefer by the computer. To do so, simply enter the procedure information via the form. If the procedure is using paper or similar, the patient can take a photo and store the image in a digital form for later consultation with the doctor. If you use a computer this can be done by scanning the procedure. The doctor may enter pertinent information with the patient's approval, but it will be identified as information added by the doctor to your ID. This system will be based on a Data Lake that has the patient's data and history of all procedures performed by him, in addition to the results of the exams. The doctor, or whoever is going to access the patient's data, can only do so with the patient's authorization. The system will generate an access password that can be canceled by the patient or will expire at a certain time. If the patient is unable to perform this process, the doctor will use a standard password on an emergency basis. To finish the service, the patient or family member must confirm the emergency care by the doctor. Until this process is performed, the system is locked. This project will reach people who use the public health system, which already uses something similar, and people who came from the private health system. Patients who have migrated from the private to the public health system can download their procedures from the private network and store them in their health history before finalizing the contract. The database is the main purpose of this project, as it aims to store information in a simple but reliable and safe way of health-related data for the entire life of the patient. The best way to store and standardize data will be studied. The account will be taken of how to insert information to better suit patients, including the elderly and those in vulnerable situations. A quick and central information guide to answer questions will also help people to make better use of the application. As a consequence, we will have quality public health, doctors being able to practice their profession with more quality, and a government that can identify and solve public health policy problems more effectively. Hospitals will be able to serve better and more patients with this, unblocking the bottleneck of this system and minimizing the wait for care. Public money can be used in a more targeted way to public health points and the population of the most vulnerable regions.

Objectives and Scope

The project aims to achieve the main objective of integrating public health data through a Data Lake so that the government can make decisions based on accurate and reliable information. A management system for this Data Lake will be developed that will allow the use of e-Health in a safe and reliable way, taking patients the best specialists available. Through this Data Lake, it will be possible to create disease prediction models using Artificial Intelligence techniques and the use of mathematical and statistical models allowing the government to be able to make more reliable and secure decisions even in an emergency, such as outbreaks and epidemics. Other points to be achieved include minimizing waiting times and maximizing the quality of patient care, providing tools for safe and reliable storage of patient data throughout their lives, minimizing losses from procedures performed by patients, and complying with the new legislation to protect personal data. Doctors will be able to issue prognoses and diagnoses based on secure information, always be in contact with their patients, obtain updated data from their patients in case of emergency for their decision making, and therefore, the doctor's work will be more efficient and quality. The project aims to meet the demand for better quality information from governments for decision-making in relation to public health policies. The data available today is not up to date and reliable. In addition, they are in national databases. The availability of data that is structured based on government actions in the area of public health may represent a more interesting alternative for both the clinical team and managers. Goals With a team of 8 people, coordinator, vice coordinator, a researcher, a specialized technician, and four students, it is possible to carry out this project in 2 years. The first step will be the organization of the data that will compose the Data Lake. It is important to discover the source of the data, its formats, and forms of integration. This requirement survey must be in line with the government's needs. This process would include visits to health units to verify on the spot the functioning of the units. The proposed goal is to visit at least 30% of health units in the first 6 months. It is also proposed to hold at least 4 meetings with the government team so that the data and Data Lake models as a whole can be evaluated. Once the system needs are diagnosed, new meetings with the government team will aim to map all Data Lake functionalities that were not identified in this proposal. The goal is to hold at least 3 meetings for this purpose. The identification of the functionalities will allow the elaboration of the Project Specification Document, which details with all the descriptions everything that the system will carry out, and outlined by the Use-Case Diagram, which is a logical guide for the developers. The Relationship Entity Modeling will also be developed in the case of structured data. Integration with semi-structured data will take place through wrappers. The tests will be carried out with the end-user in evaluation sessions to be scheduled with the government. The errors are reported by the representatives of the government and subsequently corrected by the developers. For this step, it provides for at least four interactions between developers and the staff of the government. The initial version for testing is expected to be installed in 12 months and an approval version in 14 months. Documentation: at least 2 manuals will be prepared: a document with details about the code so that any changes, updates, or implementation can be made by any other developer in the future and the installation and use manuals to assist users in an instructive way. After the model has been tested and confirmed as a fit for use, we will start creating the web version system and mobile application. Finally, it is proposed to hold 6 seminars to present the system and train users.

Methodology

The present methodology is divided into Analysis of the Informational Environment: The steps developed in this phase work with concepts discussed in the theoretical framework and aim to identify two important issues: the requirements and the types of data that will be stored in the Data Lake. The requirements will form the basis for defining the Information Architecture. The requirements analysis allows a horizontal and cross-functional view, in order to better understand the procedures and define what is essential for their performance. In general, process modeling allows you to obtain a macro view, from which you can understand your objectives, evaluate possible solutions to your problems, and take corrective measures to deviate from an ideal situation. Through process modeling, it is possible to: explain the vision of those responsible for the processes; obtain an integrated and complete view of the processes; explain rules; simplify and optimize processes; explore new concepts, and plan the information, systems, and necessary infrastructure. There are three major flows of information to be identified: information collected externally, information produced internally, and information produced and intended for the public. Diagnostic Stage: This stage aims to analyze the organizational and functional structure of all the structures of public health. To conduct this analysis, four activities are developed: Analysis of the Organizational and Functional Structure: The objective of this phase is, Identification the organization chart and functioning of the procedures. Analysis of the Business Environment: The objectives of this activity are to identify and conceptualize how all public health procedures work. Strategy Analysis - This activity will be carried out in pursuit of the following objectives: survey of the objective, strategic points, among other strategic considerations raised by the government. As a secondary objective of this activity, an analysis must be made to identify possible problems existing in the procedures and what are the requirements for their resolution, the opportunities for improvement, reengineering solutions, always with a focus on the project. Analysis of Information Systems. This activity aims to raise the existing systems in the public health area as developed systems and those maintained by the government's computer area as well as systems developed directly by users, using spreadsheets, text editors, or another tool. Every form of storing data under the responsibility of the government. Process Analysis: This step aims to make an analysis of the processes used. In order to conduct this analysis, four activities are proposed: Definition of Processes: The Identification of Macro processes, has the objective of knowing the strengths and weaknesses and seeking a competitive differential, therefore, it is essential to raise within the structure the main processes and their activities. The identification of these processes allows greater clarity of the internal environment and its relationships with the external environment. Analysis of Processes: The correlation between processes and healthcare units is a useful product as a starting point for process reengineering, as it is possible to identify the relationship between healthcare facilities and processes. For each process, identify the health units. Analysis of Processes in front of Information Systems - For each business process, identify the existing information systems that provide and create information related to the process and sub-process. This correlation is the starting point for identifying the processes that are poorly served by information systems, the degree of independence of the user areas, and the level of information integration. Reengineering: Objectives and Priorities - This activity is a complement to previous activities with the objective of modeling an adequate information environment to design the executive information system. Analysis of Information Needs The creation of an integrated information environment, internal and external, that allows the evaluation of results and strategies, in addition to enabling the identification and access to information, requires an architecture of this information. The design of an Information Architecture is based on two major blocks of study: the processes of and the information matters necessary for these processes. The stage aims to make an analysis with the existing organizational processes, Functional Areas and Information Systems, the material that will support these analyzes are the instruments of the previous stages. For analysis, three activities are scheduled: Analysis of Information Necessary to Organizational Processes: In this step, the focus should be on the information needs that guarantee the efficiency of the process and allow it to outline the appropriate strategy for the objectives in which the project is aimed. The resultant result of this activity is a List of the necessary information by the organizational process. Definition Information Matters: The analysis of information matters involves an analysis of information flows existing in the government, which fall into three main groups: information collected externally and used by it, information produced internally and used, and information produced and destined for the public. Definition of the General Information Architecture: The definition of compatible information architecture to meet the organizational model, in addition to involving a technological update, also contributes to the integration of the operational and financial systems and to the development of an Interconnected Information System and Organizational Knowledge Management. The information architecture for the executive information system is defined based on the analysis of macro processes in relation to the identified information issues. This definition involves four stages described below: Correlating Processes and Information Matters; Grouping of Macro processes in Logical Systems; Identification of information links between processes; Overview of process integration. Management Assessment: This step aims to analyze the organizational and functional structure of all health points that make up public health. To conduct this analysis, two activities are developed: Validation of the Reengineering of Organizational Processes and Validation of the General Information Architecture. The next step will be the Executive Information Environment Project: Decision Support Analysis: This step aims to fulfill the requirements necessary to identify the information that will compose the information system. To conduct this analysis, four activities are developed: Analysis of the Result Areas. This activity has as support material the instruments and products generated in the previous stages. Each person in charge receives the support material describing the Project's Mission, Vision, and Strategies. In this case, the concept used is that of the Logical View, that is, that view that models the functional characteristics that the system provides to end-users. Result Indicators: The role of performance indicators in the improvement of processes is of fundamental importance to achieve the defined objectives. To carry out this activity, the methodology uses the methodological instrument that identifies the Information Needs by Vision, Result Area, and Indicators. Each responsible fills this instrument for later examination and validated by the project team and representatives of the government. Analysis of Information Needs: At the end of this step, once again, each person in charge of the methodological instrument: Information Needs by Vision, Result Area and Indicators, fills in the data following the activities previously performed and which defined the visions and within from each of the views the respective result areas with their indicators. Management Assessment: With all the assembled teams, the products of this phase are presented and validated and should be prepared for the final report by the project team and representatives of the government.

Results

Studies indicate that there is an evident change in the operational practice of health work activities after computerization. The substitution of the written record on paper for the one typed on the equipment has the potential to optimize various work processes, making the sharing of information wider and easier. It can be said that there is a consensus on the possibility that information technology can speed up the service and facilitate the use of information, in the search for effective health results. Various information flows, as in the case of marking specialized procedures, for example, must become safer, faster, and more effective. The decline in the time of medical consultation is associated with the system already identifying the patient quickly and safely, as well as making available to the health professional their history. After computerization, requests for exams and complimentary evaluations will be controlled through procedures in the system performed safely and with a lower error rate. The data for recording and retrieving data from individuals and the health system itself will obey the new legislation on the protection of personal data. The integration of the basic network with government data can be done by connecting the local system to the national base, which may favor the integration of data between different locations. Scenario professionals using a computerized system agree that retrieving patient data and reducing handwriting activity has made the job much easier and greatly reduced the error rate. Professionals, where computerized systems have been adopted, consider that it is easier to register diagnoses coded by the International Classification of Diseases. As for the facilities provided by data standardization, professionals in the field perceived the speed to access patient data. Another change is in the vaccine record, with the system can access all vaccine records, even if the patient's medical record is lost, or even if some professional fails to record data. The expected benefits with computerization would achieve changes in work processes, facilitate the visibility of all data recorded quickly. In the computerized scenario, the processing of collective statistical data and the recovery of individual data is made possible by removing the service statement. In the end, the use of information is related to the interdisciplinary care process and responsible professional practice, which gives visibility to the good quality of the service, and more confidence in the uses of information in epidemiology and action planning. Professionals can start to evaluate better than the tests that are really needed. The system allows doctors to track all of their patients' information. The system also favors the exchange of information at different levels of health care, which may favor greater integration of actions in health care networks. The sharing of data between basic health units and hospitals is also a possible contribution system: all that a patient is, the first consultation to surgery, it is registered. Another possible contribution of the system is the reduction of the documents to be stored since exams and several other records will be digital. Although it is not reasonable to assume that computerization reduces the need for personnel, it is reasonable to assume that health professionals can optimize their activities and can devote more time to their core activities. In the operational scope, the process of computerization of health modifies activities, adds the task of typing simultaneously to writing, creates a new logic to the user's flow for their demand, and expands the universe of the professional's performance, characterizing it as multipurpose. With computerization, the time needed for the doctor to perform the work activity decreases, as he would no longer need to wait for the answer of test results to proceed with the process, the need for case documentation, and the need for frequent research to find information. Technological developments have advantages such as readability, access, and automatic data recovery performance. The production of information in this complex world of health must aggregate concepts and standards, in order to reduce the repetition and redundancy of data capture, to enable an integrated information system that meets the needs of the user, the manager, and the healthcare professional. The interface of the programs and the standardization of health information are decisive for the achievement of systems integration. The use of the information will be as important and routine as the best technology embedded in the machine represents the reality of the health professional's work. This use is recognized by the professional at the top when the results of the work facilitate the recovery of patient histories and the availability of information helps in the management of the assisted case. The advance exists because the electronic record reduces errors, standardizes concepts that can be grouped into a set of data, giving visibility to actions that facilitate diagnosis for management, for monitoring the health care line of individuals, enable planning and health decision making. The information has a strategic value for professional practice associated with the results of its use in the planning, programming, and evaluation of services, in improving individual care and attention to collective health. The quality of the data adds value and gives objectivity to the knowledge of the health situation, the epidemiological picture that will serve as a strategic input for the intervention in a population. It is necessary to recognize that technology has special importance in the health process, so that it transforms and rearranges, determining new innovations to the professional's approach to care, developing a virtuous circle of innovations. The use of health information technology increases the responsibility of professionals, improves the notification of diseases under surveillance, allowing less time for investigation and intervention in reality. The automatic extraction of reports releases management to the role of management, control, evaluation, regulation of activities and procedures and to reflect on social particularities of the described area.

How I Got The Concept & Information

Gathering such a type of information is very hard, but at the same time: Interesting. I mostly referred research papers by M.I.T and the [abpi.com]. I first understood that how the blockchain works and what exactly is decentralization. Later on, I read on the research papers provided by M.I.T to understand on how problems can be tackled using Blockchain technology for Covid 19. As an extra flavor, I also referred articles on abpi.

Challenges I Ran Into

The hardest challange I ran into was about making all of this work. I have low (to zero) experience in coding and mostly in Blockchain. So, making this work was beyond my means. And thus, it's just an idea which can surely be executed. The next challenge was about getting information. I just couldn't go on a random website and copy-paste their material without even knowing what exactly it is. Thus, I read and made my own.

Accomplishments That I'm Proud Of

The biggest accomplishment for me was that I was not even aware that what exactly is Blockchain and Data Lakes and how can I even use it to solve such a vulnerable purpose. And on top of that, I am a beginner in coding. So, first I had to learn a bit to understand the naming system and then I started with my research.

What I Learned

I learnt that even if you are totally unaware of a subject or a field, you could still shine like a star in it. I also learnt about blockchain and many other topics like data lakes, telecom connectivity, html, css, data management, and a lot more.

What's next for Covid Health Data Management

There is one and only future plan for this: Make it a reality. Maybe not in a day or two, but surely sometime soon.

The complete research paper can be read here: GitHub

Built With

Share this project:

Updates