Inspiration

The project was inspired by the challenges patients face in understanding and adhering to their medication regimens, often due to unclear handwritten prescriptions and complex medical terms. Millions of people struggle with medication adherence, which can lead to health complications, increased hospital visits, and rising healthcare costs. Our goal was to address these issues by building an AI-powered system that simplifies the process of understanding and managing prescriptions.

How the Project Was Built

We developed a system that transcribes handwritten prescriptions, extracts structured medication information, and presents it in an engaging and accessible way through talking avatars. Key components include:

  • Prescription Decoding and Transcription: Utilizing the Qwen2-VL model for Optical Character Recognition (OCR), the system transcribes handwritten prescriptions, breaking down complex medical terms into easily understandable information. This addresses a core challenge of medical adherence by simplifying the prescription process for users.

  • Structured Data Extraction: The transcribed data is converted into structured information using Google’s Gemini API, ensuring that users receive detailed dosage instructions, potential side effects, and medication interactions. This enables patients and caregivers to manage medications more effectively and reduce potential errors.

  • Avatar Generation: A unique aspect of the project was the integration of talking avatars powered by the SadTalker model and Google Text-to-Speech (gTTS). These avatars provide a more interactive and engaging way to communicate prescription information, especially useful for patients who might struggle with reading or understanding written instructions.

  • Web Interface: A Streamlit-based user interface allows users to upload their prescription images, view structured data, and interact with the avatars. The intuitive design makes it easy for patients to manage their prescriptions and access the information they need.

Key Features

The system incorporates several user-centric elements:

  • Prescription Explanation: The system decodes handwritten prescriptions, providing simplified explanations of complex medical terms, dosage instructions, and potential side effects.

  • Medication Reminders: While not yet fully implemented, we plan to incorporate personalized reminders to ensure users take their medications on time.

  • Caregiver Support: The system can assist caregivers in managing the prescriptions of elderly patients, ensuring medication adherence and reducing the risk of errors.

Challenges We Faced

The project encountered several technical challenges, including:

  • Model Compatibility: Integrating models like Qwen2-VL and SadTalker required careful coordination of dependencies between PyTorch and Torchvision. Mismatches between versions caused issues in the avatar generation pipeline, requiring adjustments to environment setups.

  • NVIDIA AI Workbench Issues: Initially, we aimed to leverage the NVIDIA AI Workbench for GPU acceleration but encountered several problems:

    • Dependency Conflicts: The specific CUDA and cuDNN versions needed for NVIDIA Workbench conflicted with our project’s dependencies, particularly for the SadTalker model. This led to errors during model execution.
    • Limited Debugging Tools: Debugging in the Workbench environment proved challenging due to the lack of robust tools for identifying and resolving runtime issues.
  • Transition to Docker: We ultimately decided to switch to using Docker Desktop and Docker Daemon, which provided greater flexibility in managing dependencies and a more seamless development experience.

  • Computational Requirements: The high GPU demands of models like SadTalker and Qwen2-VL (2B) models required significant computational resources, necessitating testing in environments equipped with NVIDIA GPUs for optimal performance.

What We Learned

This project enhanced our understanding of integrating AI models into healthcare applications. We gained insights into:

  • Simplifying Healthcare Information: The system aimed to make complex prescription information more accessible to patients, helping them understand and adhere to their medication regimens more effectively.

  • Optimizing AI Workflows: Navigating between different development environments (NVIDIA AI Workbench, Docker) taught us the importance of choosing the right tools for AI development, particularly when working with high-performance models.

  • Building User-Centric Features: We focused on building a user-friendly interface that supports both patients and caregivers in managing prescriptions.

Future Work

In the future, we plan to expand the system’s capabilities by:

  • Introducing E-Commerce Integration: We aim to integrate with e-commerce platforms, allowing users to order medications directly from the app.

  • Adding Medication Reminders and Tracking: To improve adherence, we will incorporate personalized reminder features and a system to track medication usage over time.

  • Gamification for Learning: We plan to add educational resources and quizzes to help patients better understand their medications.

Conclusion

Our team successfully developed a system that transcribes prescriptions, extracts structured data, and delivers it via engaging avatars, addressing key challenges in healthcare. The project aligns with the goals of improving medication adherence, reducing errors, and enhancing patient education, making it a valuable tool for both patients and caregivers.

Built With

  • docker
  • gemini
  • google-cloud-vision
  • google-text-to-speech
  • huggingface
  • ocr
  • pil
  • python
  • pytorch
  • qwen2-vl-2b-instruct
  • sadtalker
  • streamlit
  • transformers
Share this project:

Updates