Patient Match

Inspiration

Records can be inconsistent between various databases through typos, forgotten information, and misspellings. This algorithm could merge many data sets into one while avoiding redundancies of information.

What it does

Groups the records which it thinks is the same person together.

How we built it

Used Levenshtein distance and Double Monophone to see how close two records were. We dealt with various edge cases such as switching, potential misspellings, abbreviations, shortened versions of names/places, and nicknames. After completing the main algorithm, we judged ideal weights by how likely someone is to accidentally input wrong data. We tested with various confidence thresholds to determine the ideal number of groups for the given test data set.

Challenges we ran into

Dealing with unnormalized data and the numerous edge cases of how someone could enter a record incorrectly. How much weight for each columns influence on the overall confidence score.