Family Physicians are in great shortage across Canada; the Kitchener-Waterloo region has a small population of 535,154 people, yet approximately 25,000 still do not have family doctors. To give a general idea, the average family physician usually has 1000-2000 registered patients, hence the shortage is quite severe. Despite being in severe shortage, many family physicians are forced to manually update patient records and associated paperwork. This includes regularly reading through the patient history and ordering the correct tests based on symptoms present with respect to timeline. Physicians often do not have the time to thoroughly read the entire medical history, hence record-keeping duties are often postponed. There is then a higher chance of forgetting to order necessary tests and accumulation of neglect has led to many cases of license suspension.
What it does
Medissist composes of an Artificial Neural Network (ANN) model that reads through patient records and automatically suggests relevant tests based on past symptoms and medical history. An interactive web app allows input of patient name and medical history (multiple paragraphs, strings separated by newlines), and the physician is then prompted to approve relevant tests for them to be ordered. This reduces the necessary process to only a few seconds and will increase the rate of record-keeping while still adhering to ethical standards by limiting AI-decision making to indirect suggestions.
How I built it
The ANN model
Training data consists of classified text, which is processed in the following manner:
- irrelevant words such as "and", "the", "a" are ignored
- all other words are stemmed to their root before being stored, so "fever", "feverish" would both be stored as "fever" and hence equal in value
- list of key words are valued 1 if exists in input string, 0 otherwise; lists are referred to as bagged words
- classification is based on how categories correlate to word patterns; model is generated based on training data
Python's Natural Language ToolKit (NLTK) library was used. It contains built-in functions for stemming words.
Anvil was used to build the web component since it has many useful features that are directly compatible with Jupyter Notebook.
Challenges I ran into
- Some platforms were previously installed in Python 2 and Python 3 was required for these functionalities, so I had to change path dependencies for all relevant libraries and tools.
- Anvil was a relatively newer platform and many issues arose when attempting to send/retrieve data to/from Jupyter Notebook.
- time was limited and it was difficult to generate large amounts of data that the model could use to train
Accomplishments that I'm proud of
- implementing text classification
- setting up a user-friendly front end that befits purpose
What I learned
- how to ANN :D
- how to connect model input/output to front-end
What's next for Medissist
Accuracy & Coverage
Due to limited time period of the hackathon, training data was scarce and thus the classification was limited to three conditions: Syphilis, Tuberculosis and HIV. We are planning to add more sample patient records to train a more accurate model; in addition, wider range of data would allow us to classify not only these 3 named conditions but many others as well.
Use of Charts &Tables
The input is currently restricted to text. In medical records there are often charts and tables; we hope to be able to read that data and use it as a part of the classification process.
Medical data is highly sensitive information. We were unable to address this due to the scope of the hackathon but hope to implement security measures such as encryption to protect patient privacy.