In the United States alone, nearly 400,000 people are killed annually due to preventable medical errors . With over 12 million adults in the U.S. receiving misdiagnoses yearly, inaccurate diagnoses are the leading cause of medical errors in hospitals. Misdiagnoses refer to inaccurate assessments by healthcare providers of a patient's condition, often leading to either inappropriate or excessive treatment and sometimes no treatment all. In a case study conducted at the University of Maryland Medical Center, the authors found that in a group of 177 patients, over 90 percent of patients received at least one unnecessary treatment, highlighting the unreliable nature of physicians in hospital settings. Not only can misdiagnoses undermine the effectiveness of clinicians, but over one-third of such cases with inaccurate treatment result in death, injury, or permanent disabilities.

Seeing the lack of advanced ML techniques being used for improved clinical organization and efficiency we decided to create MedicAI: a multi-faceted platform that utilizes NLP.

What it does

MedicAI allows users to input files of medical records to be classified as one of 4 categories: Heart, Brain, Reproductive, or Digestive system. This allows doctors to be able to easily classify unstructured Electronic Health Records without having to read through them all or lose track of individual reports.

Doctors can upload a file from which the classification is recorded to our database which they can view in the view records page.

How I built it

Our website was built with HTML, CSS, and JS on the front end and Flask on the backend. Additionally, we used Firebase for the authentication and database to store records. As for the model, we used a variety of machine learning libraries such as Scikit-learn, Scikit-survival, Tensorflow, Keras, Numpy, and Pandas.

We tested 4 different machine learning models to classify a medical record as one of four medical systems. These models were Logistic Regression, Random Forest, LSTM, and CNN-LSTM. The CNN-LSTM gave us the best accuracy with 97.9%.

When uploading a report, we used Google's Tesseract-OCR engine to read text directly from the files and feed it into the model.

The classification feature can be viewed on the add report page, and when a doctor creates a report, the results go to the backend in Firebase. These records are then all visible on the view reports page, which contains both inputs and outputs from the doctor.

Paper (Depth)

Along with our web app we wrote a full research paper on our topic and machine learning models, discussing related work in the field, the value our work holds, as well as a detailed explanation of our experiments and results. We decided to show the depth of our project by explaining our methods and testing a variety of models.

The paper can be viewed here.

What's next for MedicAI

In the future we hope to be able to classify more than our current 4 categories through scraping a larger dataset as well as testing more deep learning models such as transformer based ones.

Built With

Share this project: