Impatient Records

About

For LA Hacks 2020, Office Ally posed a challenge to come up with an algorithm or system to intelligently combine medical records. The records are taken from multiple sources and contain many typos and inconsistencies that make consolidating them very difficult.

To solve this problem, I made use of a hyper-graph database and inference engine called Grakn.AI, along with GRAQL, its specialized, intuitive schema and query language. Storing the data in a semantic graph allows me to create automatic "inference rules" - an additional layer of implied relationships determined at query-time - to help simplify the matching algorithm.

Within my knowledge graph, I modeled several entities (nodes) such as person, address, and account, as well as relevant relations (edges) such as holds-account and lives-at-address. I further leveraged Grakn.AI's hyper-graph capabilities to create higher-order relations (hyper-edges); the patient-record relation keeps track of the original CSV/SQL rows, and the patient-record-group relation groups records that are determined by the algorithm to be of the same person.

I also created a set of Python scripts to import records from a CSV file into the knowledge base, clean and standardize the data, and facilitate the group matching algorithm.

Getting Started

Clone the repository at https://github.com/darrylyeo/LAHack-2020.

Run install.sh (or the equivalent commands for your computing environment) to install the project dependencies, including Java 1.8, Grakn.AI Core, and the Grakn.AI Python client library.

Next, run run.sh. This will instantiate a local Grakn.AI database using the .gql schema found in the database/ directory, then automatically import the data found under the sample-data/ directory.