Inspiration

Adrienne and Abhi met at a Healthcare conference and got talking about some of the greatest challenges in healthcare.

Adrienne is a mental health researcher as well as a practitioner affiliated with the VA and Stanford. She started talking about some of the specific challenges faced by veterans and others living with PTSD and that just getting started with a diagnosis can take months. For example, measurement-based care is the gold standard but implementation is limited due to time constraints, administrative burden, and lack of convenience.

The conversation eventually veered to using LLMs.

Abhi, who is a health tech entrepreneur, had been actively working with generative AI based models and was experimenting with the Gemini Pro models at that time. He thought that it might be a great idea to join the Google AI hackathon and use the platform to build an extremely low code assistant that could easily administer mental health assessments, transcribe and summarize results, score responses, highlight risks and make life simpler for patients as well as clinicians.

Thus started this adventure! BTW, here's a video of the team explaining the idea in detail: link

What it does

The model can currently do the following validated assessments:

  1. CAPS-5 Monthly (Primary PTSD assessment)
  2. Insomnia Severity Index (Insomnia)
  3. Neurobehavioral Symptom Inventory (Concussion)
  4. AUDIT Assessment (Alcohol abuse)
  5. TAPS Assessment (Substance abuse)
  6. Depression Assessment (Depression)

The assistant is able to assess the patient in a sequential manner going step by step through each question. If some questions are to be skipped or asked differently based on the patient response, the model is able to do that as well. It can provide empathetic responses to support difficult answers, although more training on conversations is needed to help it get better.

The assistant handles refusal to answer certain questions as well. In such scenarios it just makes a note of the refusals and highlights those to the clinician. It still tries to move forward with other questions.

The model also does sentiment analysis as well as looks for certain “red flags” such as evidence of suicidal ideations or abuse. When such red flags are assessed, the specific examples are highlighted for the clinicians and crisis resources are provided to the user.

We can also have the model both be the assessor as well as the patient and create a full transcript and assessment based on a persona. This could potentially be used in scenarios where we have a broader transcript of patient conversations that the model could look into and use that to run an assessment.

How we built it

This assistant was built entirely in the Google AI studio using the Gemini Pro 1.5 model.

The model was first trained on each individual assessment. Then a litany of assessments of each type were done while looking for correctness of:

  1. Sequence of questions
  2. Accuracy of questions
  3. Tone of questions
  4. Empathy for responses (esp the PTSD assessment)
  5. Accuracy of response recording (e.g. if patient used a different word as an answer than the list of options, did the assistant still mark it correctly as a human would)
  6. Transcript generation
  7. Summarization
  8. Recommendations
  9. Red flagging
  10. Agentic conversation (do a back and forth with itself on an assessment)

With the Pro 1.5 context window being 1M token sized, there was no need for a RAG implementation, which was just awesome. Due to this, no coding was needed and we were able to accomplish most of our objectives through n-shot prompting. There were some universal instructions that we were able to add to system instructions to correct some issues.

All these assessments and training were accomplished by utilizing only about 5% of the total context capacity, so we can easily expand this to 100+ assessments without any issues.

Challenges we ran into

Due to the sensitive nature of mental health questions and answers, all models except pro 1.5 flagged the conversations and wouldn’t give the results despite minimal safety settings. The only option then for us was to use 1.5, which meant that we couldn’t generate code for the model until very recently.

Also, given the heavily conversational nature of this application, we kept hitting daily rate limits - that’s when we created an agentic mode so that the model could give an assessment to itself and allow us to quickly test several corner cases.

Sometimes the model made math errors in score calculations as well as missing some steps in the assessment for no reason. We used the “chain of reasoning” prompting to fix that.

We did see evidence of gender bias when the model assumed a common name used for both men and women to belong to men. Any bias inherent to these standard tests can also be expected to creep in.

Accomplishments that we're proud of

  1. We now have a basic model for administering assessments of PTSD and related mental health conditions ready for beta testing
  2. We were able to do this with minimal coding
  3. The model can calculate scores accurately
  4. We can easily summarize, look for red flags and show some recommendations for the clinicians
  5. The assessments can be provided in any common language and scoring reported back in English
  6. Built an initial version of a potentially awesome tool for individuals with post-traumtic stress and clinicians
  7. Did it all with zero code

What we learned

  1. Collaboration between clinicians and product makers can generate something phenomenal in healthcare
  2. Generative AI can have several general capabilities that can significantly improve cumbersome problems in healthcare and amplify the impact of clinicians
  3. Large context windows can be a potentially better solution than RAG in cases such as these
  4. Ethical, safe, and responsible uses, cultural and racial bias and responsiveness, and user privacy are paramount to consider.

What's next for PTSD Assistant

We are planning to integrate the core model into a user interface and present it to some users for initial testing and feedback. We then plan to bring this in front of clinicians to get feedback and perhaps use it in a more clinical environment on a selective basis. This of course would be done once HIPAA, security and other considerations are fulfilled.

In addition to administering and scoring assessments - we can create a treatment decision aid so that users can go beyond a label and use their data to determine how THEY want to heal. There is no one size fits all and patient-centered decision making around treatment gives them control over their path to healing but there are some basic frameworks available.

Our eventual hope is to get it into the hands of consumers so that their journey toward wellness may start early and perhaps be shorter.

Built With

Share this project:

Updates