Clinic Sense

Inspiration

for feature 1 (ADE identification and reporting)

While exploring for ideas for the hackathon, we came across a lot of issues that the healthcare is facing.
However, when we read through some of the articles and the research papers, we found that the problem of reporting Adverse drug events was quite huge.
According to one PubMed article, in US alone, over 200, 000 adverse events occur. But, out of them, only 5% are reported. And, this was quite a serious stats because almost around 50% of the unreported events usually resulted in fatal events like death.
The scale of the issue and the impact we could make was a clear inspiration for us in choosing this problem to solve using AI.

for feature 2 (clinical keyword disambiguation)

As a Non medical professional I have been facing difficulties in understanding the sense and meaning of abbreviations mentioned in the health records
Later realised a lot of people like me go through same problem
Hence we took charge to decode the senses and meaning of abbreviations so that it is easy for both medical and non medical people to understand them hence improving the doctor patient relationship

What it does

End to end automatic identification and form filling in order to report ADEs to FDA
Disambiguates abbreviations for people to understand the sense and the meaning they are used in

How we built it

Please read below for end to end building process

End 2 End building process

Research use cases by reading research paper, blogs etc
Identify which use cases have the maximum impact on human life and can improve patient doctor relationship
Feature 1 (adverse event classification and reporting)
Background on feature 1
Adverse drug events (ADEs) are harmful and unintended consequences of medication use, and a leading cause of unplanned hospital admissions and deaths
Their detection, documentation and reporting are fundamental to pharmacovigilance activities, the science of assessing and monitoring the risk/benefit profiles of medications throughout their lifecycle
In clinical practice, fewer than 5% of ADEs are reported, even in jurisdictions where reporting is mandatory
Building Process
Searched for ADE dataset and found it on https://huggingface.co/datasets/ade_corpus_v2/viewer/Ade_corpus_v2_classification/train
Searched for medical transcript dataset and found it on https://www.kaggle.com/datasets/tboyle10/medicaltranscriptions
Augmented the training set of unbalanced dataset found above using t5base
Wrote script to train custom algorithm using own docker container on sagemaker
train test val split
Train!
Learn how to use medical comprehend service to extract entities
Find framework which can fill pdf in python
Find framework that can identify different sections in medical transcript
Use all frameworks, services and model above to build an end to end pipeline that identifies adverse event and automatically fills the FDA reporting form pdf which is downloadable and ready to be mailed to FDA
Deployed above pipeline on aws lamda using aws sam cli having trigger of api gateway
Feature 2
Background on Feature 2
Abbreviations are considered an essential part of the clinical narrative.
They are used not only to save time and space but also to hide serious or incurable illnesses.
Every abbreviation can be used in different senses adding to more confusion.
Simple Example She had an AB (here AB means Abortion) She is AB +ve (here AB refers to blood group)
Pubmed abstract https://pubmed.ncbi.nlm.nih.gov/21459778/ concluded The majority of healthcare professionals have a very poor knowledge of commonly used abbreviations. Use of unambiguous and approved list of abbreviations is suggested in order to ensure good communication in patient care.
Building Process
Search for dataset of abbreviations and found it on https://conservancy.umn.edu/handle/11299/137703 by reading some research papers
Dataset had only +ve samples
Did negative hard mining on the samples
Removed samples which had count < 10
train test val split
Wrote script to train custom algorithm using own docker container on sagemaker
Hand label meaning of 186 senses! to make user understand meaning of sense
Deployed above pipeline on same aws lamda using aws sam cli having trigger of same api gateway
Triggered a cloudwatch event rule to keep lambda warm

Tech used

AWS medical Comprehend - to identify entities
AWS Sagemaker - for training custom algorithms using personal docker container
AWS lambda - to host app
AWS ecr - for storing container
AWS cloudwatch events - to keep lambda warm so user dont experience cold starts(free tier doesnt have provisioned concurrency)
AWS cloudwatch logs - for logs
AWS s3 - for storing models and files
AWS sam cli - for serverless app deployment
AWS api gateway - as a trigger to lambda
Docker - for running app in a container
Transformers - Bioclinicalbert from huggingface Transformers(Bioclinical as it is relevant to domain)
Pypdf2 - for filling pdf on the go
t5base - for augmenting the training set
streamlit cloud - for frontend deployment

Challenges we ran into

Finding what features to build by going through n number of blogs, research paper and websites of major healthcare bodies
Finding datasets
Imbalanced dataset for first feature (ADE identification and reporting)
GPU instances werent available as part of aws free tier, had to raise a customer ticket for the same
AWS lambda doesnt provide > 3 gb ram as part of free tier, stil we managed to host 3 bert models in our app in a single big fat lambda
Finding a way to keep lambda warm, provisioned concurrency isnt there for free tier lambda
Only positive samples were available for Clinical word disambiguation, negative samples had to be generated
PDF rendering on streamlit cloud only works properly on firefox(tested with chrome, firefox, chromium)

Accomplishments that we're proud of

Hosting 3 bert models on aws lambda with 3 gb ram only
End 2 end product with not 1 but 2 features that have potential to have a lot of impact on human life
We used various aws services to support our features
Almost production ready feature
Hardly anyone in the market is providing such end 2 end solution of ade identification and reporting
Last but not the least, WE ARE PROUD OF ALL OUR STRUGGLES DURING THIS PROCESS AND THE LEARNINGS WE PICKED UP DURING THE SAME :)

What we learned

End to end deployment of ML features on AWS lambda
Keeping lambda warm without provisioned concurrency by using cloudwatch event rules
How to use sagemaker with custom algorithms
Learnt about all aws services that were used in building the product in good depth
Learnt about challenges faced in the healthcare industry