Inspiration

for feature 1 (ADE identification and reporting)

  • While exploring for ideas for the hackathon, we came across a lot of issues that the healthcare is facing.
  • However, when we read through some of the articles and the research papers, we found that the problem of reporting Adverse drug events was quite huge.
  • According to one PubMed article, in US alone, over 200, 000 adverse events occur. But, out of them, only 5% are reported. And, this was quite a serious stats because almost around 50% of the unreported events usually resulted in fatal events like death.
  • The scale of the issue and the impact we could make was a clear inspiration for us in choosing this problem to solve using AI.

for feature 2 (clinical keyword disambiguation)

  • As a Non medical professional I have been facing difficulties in understanding the sense and meaning of abbreviations mentioned in the health records
  • Later realised a lot of people like me go through same problem
  • Hence we took charge to decode the senses and meaning of abbreviations so that it is easy for both medical and non medical people to understand them hence improving the doctor patient relationship

What it does

  • End to end automatic identification and form filling in order to report ADEs to FDA
  • Disambiguates abbreviations for people to understand the sense and the meaning they are used in

How we built it

Please read below for end to end building process

End 2 End building process

  • Research use cases by reading research paper, blogs etc
  • Identify which use cases have the maximum impact on human life and can improve patient doctor relationship
  • Feature 1 (adverse event classification and reporting)
  • Background on feature 1
  • Adverse drug events (ADEs) are harmful and unintended consequences of medication use, and a leading cause of unplanned hospital admissions and deaths
  • Their detection, documentation and reporting are fundamental to pharmacovigilance activities, the science of assessing and monitoring the risk/benefit profiles of medications throughout their lifecycle
  • In clinical practice, fewer than 5% of ADEs are reported, even in jurisdictions where reporting is mandatory
  • Building Process
  • Searched for ADE dataset and found it on https://huggingface.co/datasets/ade_corpus_v2/viewer/Ade_corpus_v2_classification/train
  • Searched for medical transcript dataset and found it on https://www.kaggle.com/datasets/tboyle10/medicaltranscriptions
  • Augmented the training set of unbalanced dataset found above using t5base
  • Wrote script to train custom algorithm using own docker container on sagemaker
  • train test val split
  • Train!
  • Learn how to use medical comprehend service to extract entities
  • Find framework which can fill pdf in python
  • Find framework that can identify different sections in medical transcript
  • Use all frameworks, services and model above to build an end to end pipeline that identifies adverse event and automatically fills the FDA reporting form pdf which is downloadable and ready to be mailed to FDA
  • Deployed above pipeline on aws lamda using aws sam cli having trigger of api gateway
  • Feature 2
  • Background on Feature 2
  • Abbreviations are considered an essential part of the clinical narrative.
  • They are used not only to save time and space but also to hide serious or incurable illnesses.
  • Every abbreviation can be used in different senses adding to more confusion.
  • Simple Example She had an AB (here AB means Abortion) She is AB +ve (here AB refers to blood group)
  • Pubmed abstract https://pubmed.ncbi.nlm.nih.gov/21459778/ concluded The majority of healthcare professionals have a very poor knowledge of commonly used abbreviations. Use of unambiguous and approved list of abbreviations is suggested in order to ensure good communication in patient care.
  • Building Process
  • Search for dataset of abbreviations and found it on https://conservancy.umn.edu/handle/11299/137703 by reading some research papers
  • Dataset had only +ve samples
  • Did negative hard mining on the samples
  • Removed samples which had count < 10
  • train test val split
  • Wrote script to train custom algorithm using own docker container on sagemaker
  • Hand label meaning of 186 senses! to make user understand meaning of sense
  • Deployed above pipeline on same aws lamda using aws sam cli having trigger of same api gateway
  • Triggered a cloudwatch event rule to keep lambda warm

Tech used

  • AWS medical Comprehend - to identify entities
  • AWS Sagemaker - for training custom algorithms using personal docker container
  • AWS lambda - to host app
  • AWS ecr - for storing container
  • AWS cloudwatch events - to keep lambda warm so user dont experience cold starts(free tier doesnt have provisioned concurrency)
  • AWS cloudwatch logs - for logs
  • AWS s3 - for storing models and files
  • AWS sam cli - for serverless app deployment
  • AWS api gateway - as a trigger to lambda
  • Docker - for running app in a container
  • Transformers - Bioclinicalbert from huggingface Transformers(Bioclinical as it is relevant to domain)
  • Pypdf2 - for filling pdf on the go
  • t5base - for augmenting the training set
  • streamlit cloud - for frontend deployment

Challenges we ran into

  • Finding what features to build by going through n number of blogs, research paper and websites of major healthcare bodies
  • Finding datasets
  • Imbalanced dataset for first feature (ADE identification and reporting)
  • GPU instances werent available as part of aws free tier, had to raise a customer ticket for the same
  • AWS lambda doesnt provide > 3 gb ram as part of free tier, stil we managed to host 3 bert models in our app in a single big fat lambda
  • Finding a way to keep lambda warm, provisioned concurrency isnt there for free tier lambda
  • Only positive samples were available for Clinical word disambiguation, negative samples had to be generated
  • PDF rendering on streamlit cloud only works properly on firefox(tested with chrome, firefox, chromium)

Accomplishments that we're proud of

  • Hosting 3 bert models on aws lambda with 3 gb ram only
  • End 2 end product with not 1 but 2 features that have potential to have a lot of impact on human life
  • We used various aws services to support our features
  • Almost production ready feature
  • Hardly anyone in the market is providing such end 2 end solution of ade identification and reporting
  • Last but not the least, WE ARE PROUD OF ALL OUR STRUGGLES DURING THIS PROCESS AND THE LEARNINGS WE PICKED UP DURING THE SAME :)

What we learned

  • End to end deployment of ML features on AWS lambda
  • Keeping lambda warm without provisioned concurrency by using cloudwatch event rules
  • How to use sagemaker with custom algorithms
  • Learnt about all aws services that were used in building the product in good depth
  • Learnt about challenges faced in the healthcare industry

What's next for Clinic Sense

  • Perfect the solution wrt to supporting more entities that can be identified
  • Making the solution more robust
  • Batch processing events
  • Release it as a proper product

Built With

  • api-gateway
  • aws-medical-comprehend
  • bioclinicalbert
  • cloudwatch
  • cloudwatch-events
  • docker
  • ecr
  • huggingface
  • lambda
  • medspacy
  • pypdf2
  • python
  • s3
  • sagemaker
  • streamlit
  • t5base
  • transformers
Share this project:

Updates