Electronic health records provide valuable data that can be used to improve the quality of patient care. However, a majority of that data consists of free-form text notes written by doctors, which is difficult to analyze using traditional tools. We wanted to change that by developing a prediction tool that uses natural language processing (NLP) techniques built on modern neural network architectures.

What it does

Our tool predicts the likelihood that a patient will be assigned each of the top-10 diagnostic codes, based only on free-form text notes from the patient's visit.

How I built it

We collected data from MIMIC-III, a dataset that contains rich, anonymized patient records documenting over a million hospital visits. Using a transformer-based NLP system built using Pytorch, we trained several models overnight using Google Cloud and selected the best one. We then created an API and flask-based webserver to allow easy visualization and live, dynamically updating predictions from typing freeform text.

Challenges I ran into

  • Model was very slow to train
  • Data was not well formatted
  • Live visualization was more difficult than expected
  • Collaboration tools
  • Large files
  • Catastrophic backtracking

Accomplishments that I'm proud of

  • Working transformer-based NLP model
  • Dynamically updating charts!
  • Promising results given training time available

What I learned

  • Training is slow, start early
  • Visual Studio is slow but VScode live share is great
  • Lots of advanced deep learning architectures are quite accessible

What's next for Code Overdose

  • Hyperparamer tuning
  • Experiments with different word embeddings/sentence2vec
  • Predict top 50 codes
  • Increase prediction stability
  • Result validation and error analysis

Built With

Share this project: