DOC-ASSIST

Patient 01
Patient 02
Opening Interface
While Transcribing
While Generating

Inspiration

The motivation behind developing this code stems from my current status as a fourth-year undergraduate student in Biomedical Engineering, where my thesis revolves around automating the entire patient-doctor interaction process. Primarily, I focus on computer vision to generate a patient's 3D model from a single image and gather all anthropometric data. This serves as the core aspect of my thesis. Additionally, I previously collaborated with Augmedix, a company collecting doctor-patient conversations for training specific models. However, I realized the potential to integrate existing technology into my thesis for full automation. Coincidentally, the hackathon provided the perfect opportunity to pursue this endeavor. Despite the availability of Bangla transcription through Google, I encountered limitations due to the requirement of a credit card for setup. Consequently, I opted for an open-source transcription model, albeit with less-than-optimal performance. To compensate, I employed Gemini to interpret the data and provide diagnostic suggestions based on patient conversations. It's worth noting that this tool is designed specifically for assisting doctors rather than general users. Remarkably, I managed to develop the speech-to-diagnosis component within a mere two days.

What it does

Doc-Assist transcribes and interprets patient dialogues, aiding doctors in diagnosis. Leveraging open-source transcription and Gemini API, it optimizes data comprehension.

How I built it

In two days, with Gemini and a lot of coffee. Haha!

Challenges I ran into

Limited access to paid transcription services posed a challenge. Overcoming performance issues with open-source tools required extensive refinement. Most students don't have a credit card in third world countries. If you are reading this, Google Cloud should have a different way for students! Also, I should have added a template for each patient to read before, to understand their sound type and better post-interpretation with Gemini. I should also have segmented the audio into chunks before transcribing. But I joined the hackathon late.

Accomplishments that I'm proud of

Successfully integrating multiple technologies to create an efficient doctor-assistive tool. Overcoming obstacles with innovative solutions.

What we learned

The importance of resourcefulness in leveraging existing technology for healthcare innovation. Enhanced understanding of AI-driven transcription and diagnostic assistance.

What's next for DOC-ASSIST

Audio should be made into smaller chunks. A template for individual patients to read before using the app. More Bangla medical audio.

Built With

gemini
python
streamlit
whisper

Updates

S.M. Sakeef Sani started this project — May 02, 2024 04:21 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.