For LA Hacks 2020, Office Ally posed a challenge to come up with an algorithm or system to intelligently combine medical records. The records are taken from multiple sources and contain many typos and inconsistencies that make consolidating them very difficult.
To solve this problem, I made use of a hyper-graph database and inference engine called Grakn.AI, along with GRAQL, its specialized, intuitive schema and query language. Storing the data in a semantic graph allows me to create automatic "inference rules" - an additional layer of implied relationships determined at query-time - to help simplify the matching algorithm.
Within my knowledge graph, I modeled several entities (nodes) such as
account, as well as relevant relations (edges) such as
lives-at-address. I further leveraged Grakn.AI's hyper-graph capabilities to create higher-order relations (hyper-edges); the
patient-record relation keeps track of the original CSV/SQL rows, and the
patient-record-group relation groups records that are determined by the algorithm to be of the same person.
I also created a set of Python scripts to import records from a CSV file into the knowledge base, clean and standardize the data, and facilitate the group matching algorithm.
Clone the repository at https://github.com/darrylyeo/LAHack-2020.
install.sh (or the equivalent commands for your computing environment) to install the project dependencies, including Java 1.8, Grakn.AI Core, and the Grakn.AI Python client library.
run.sh. This will instantiate a local Grakn.AI database using the
.gql schema found in the
database/ directory, then automatically import the data found under the