Electronic health records provide valuable data that can be used to improve the quality of patient care. However, a majority of that data consists of free-form text notes written by doctors, which is difficult to analyze using traditional tools. We wanted to change that by developing a prediction tool that uses natural language processing (NLP) techniques built on modern neural network architectures.
What it does
Our tool predicts the likelihood that a patient will be assigned each of the top-10 diagnostic codes, based only on free-form text notes from the patient's visit.
How I built it
We collected data from MIMIC-III, a dataset that contains rich, anonymized patient records documenting over a million hospital visits. Using a transformer-based NLP system built using Pytorch, we trained several models overnight using Google Cloud and selected the best one. We then created an API and flask-based webserver to allow easy visualization and live, dynamically updating predictions from typing freeform text.
Challenges I ran into
- Model was very slow to train
- Data was not well formatted
- Live visualization was more difficult than expected
- Collaboration tools
- Large files
- Catastrophic backtracking
Accomplishments that I'm proud of
- Working transformer-based NLP model
- Dynamically updating charts!
- Promising results given training time available
What I learned
- Training is slow, start early
- Visual Studio is slow but VScode live share is great
- Lots of advanced deep learning architectures are quite accessible
What's next for Code Overdose
- Hyperparamer tuning
- Experiments with different word embeddings/sentence2vec
- Predict top 50 codes
- Increase prediction stability
- Result validation and error analysis