As college athletes, we have gone through numerous physical examinations that we needed to pass in order to be cleared to play. In each of those physical examinations, the doctor had to complete a lot of paperwork as they examined us, taking extra time and hassle. The doctors often filled out such forms by hand, forcing either them or us to have to scan the form later. When we saw that we had an opportunity to utilize Microsoft’s Bing Speech API as part of our Hacktech project, the three of us knew that we had to implement the Automated Physician Assistant (APA). At that point, we realized that the problem was bigger then we could have imagined. Studies have shown that doctors spend two-thirds of their time on paperwork. Doctors are annoyed and patients suffer the consequences of higher medical costs and longer wait times. It’s finally time for the once tedious and mind-numbing task of filling out paperwork to be automated.
What it does
Offers service to automatically fill out medical paperwork from listening to the conversation between the doctor/nurse and the patient during a medical examination.
How we built it
We wrote a python script to parse the transcribed audio from the Bing Speech API and programatically create an RTF (Rich Text Format) document that contains the answers from the patient during the conversation in a form-like format.
Challenges we ran into
To start, we needed to narrow down the modes of the Bing Speech API to pick the specific options necessary for the doctor-patient conversation. We also ran into an issue where the Bing Speech API was separating statements during speaker pauses, whereas we wanted to use a very simple start/stop interface. To fix this, we locally stored the text from all of the previous speaking events, only outputting a file when the user officially ends the conversation by pressing the stop button.
We had some issues parsing the text from the conversation. The text does not have a particularly well defined structure, and we had to devise a way to extract the relevant information. We decided to scan the text for question-keywords, and then scan forward to find the expected relevant information. For example, “glasses” might be a question-keyword for the interaction “Do you wear contacts or glasses? No, I do not.” In this case, we would search forward to see if “No” or “Yes” appears first, to see which was the patient’s response. This is complicated by different types of questions requiring different responses. Some require yes/no, some require numbers, and some, such as blood pressure, require two numbers. We had to account for such differences and parse accordingly.
Another challenge was familiarizing ourselves with RTF enough to be able to create the form from the information gathered by parsing the text. We found it tricky to deal with.
Accomplishments that we're proud of
We created a fully functional prototype. Our software can listen live to a conversation and/or take an audio file and convert it to a written conversation between physician and patient. From this point, we have all the data we need to accurately complete a generic physical form based on what was said.
In addition, we learned some new technologies during this hack, most notably how to integrate Microsoft cognitive services with a novel idea to create a wholesome product.
What we learned
What's next for Automated Physician Assistant (APA)
We look forward to expanding our form recognition technologies. With optical character recognition (OCR) technology, we plan on developing a robust mechanism that can take in any form and automatically adapt our form-filling software to that specific form. In addition, we intend to incorporate machine learning methods for sentiment analysis on text and speech to improve our language processing. Long term, we can adapt this technology for additional applications such as employee forms and jury duty recording.