MediClear

MediClear - Arnav Sastry, Kevin Pei, Nicholai Kudriashov, Maggie Lu According to a 2024 survey by Flywire, 75% of American patients believe that medical bills are too complicated. With rising healthcare costs, the average patient is highly concerned with understanding what exactly they’re paying for with regard to their medical affairs. However, medical bills are often unintuitive to read, and they’re filled with list items based completely on medical jargon, difficult for the average person to understand. Moreover, even if a patient can decipher a medical bill, they can’t always rely on the information being true. According to Becker’s Hospital Review, a shocking 80% of medical bills contain errors and inaccuracies in some form. Oftentimes, these errors result in patients being overcharged or double-charged, forcing them to pay thousands out of pocket for fees they don’t truly owe. Our solution, MediClear, aims to solve this problem. Using OCR technology, MediClear scans images of hospital bills for critical information, such as the line items as well as their prices. After that, a chatbot displays the information in the bill and provides a short, simple explanation of each line item. This explanation helps patients understand what exactly they are being charged for without feeling confused due to complex jargon. It also allows users to easily notice errors, such as procedures that shouldn’t have been charged to them, so that they can take action to remedy them. Once the hospital bills have been displayed, the prices of each procedure are cross-checked against a data set of average prices and price ranges for the procedure. It is then made noticeable when a procedure’s price is outside its normal range, showing the user that they may have been overcharged for something. Overall, our product uses JavaScript for the front end, using the React Native and Node.js environment. Our backend, namely the OCR scanning and chatbot integration, is built using Python. Firstly, we use the Cloudinary API to take and store images. Our OCR technology utilizes the Pytesseract and Pillow libraries to scan text in images and extract that data. It stores that information in a text file and uses the OpenAI API to analyze the information. The OpenAI API is used to display each line item along with an explanation, and then it detects errors (repeated line items and line items whose prices are out of the given price range). Finally, we used the Flask framework to assist in deploying our application. We ran into quite a few developmental challenges while doing this project. Firstly, we wanted to do PDF scanning alongside image scanning, as we anticipated many medical bills would be in the form of PDFs, as well as physical documents that individuals would take pictures of. We tried to use libraries such as pdfplumber and pdf2image, as well as the PDF functionality within Pytesseract. However, we were unable to convert the data from PDFs into text files for our use, so we were forced to stick with images. Another major error we encountered was finding the necessary data to do our error checks. Initially, we wanted to utilize the Turquoise API to get data about average prices for various medical procedures. However, the API turned out to be restricted, and we were unable to request an API key to get the necessary data. However, we were able to find a data set that provides the necessary information regarding the normal price ranges and average prices for various medical procedures that we ended up using in the error checks.