MediScan

Overview
Augmented Reality Document Explanations
Medication Reminders

Inspiration

Our project was inspired by one of our teammate’s (Jensen’s) background:

“Did you remember to take your medicine?” I ask my dad every night. Sometimes, he triumphantly says yes. Other times, he says yes but attempts to sneak to the pill cabinet to covertly take his medicine, unwilling to admit he forgot. Regardless, I remind him every night as my dad needs 8 pills a day to survive.

Caring for my disabled father has shaped my life. It has shown me the daily challenges that come with memory loss and the immense responsibility caregivers bear. We aspire to make it easier for individuals like my dad—and their caregivers—to track essential medications, reducing stress and improving health outcomes.

Beyond medication management, we also want to empower patients to understand their own medical data and the medication they're prescribed. Lab results and prescribed medication are often indecipherable to those without medical training, making it difficult for patients to advocate for themselves. I experienced this firsthand when a test revealed heightened creatinine levels. One doctor assured me I was fine—my results were within normal limits. Fortunately, a second doctor saw the bigger picture: while still in the normal range, my levels were much higher than usual. This led to further testing that uncovered a serious issue—one of my kidneys was failing.

I was lucky. But no one should have to rely on luck to understand their own health. Our mission is to bridge this gap, ensuring that patients and caregivers have the tools they need to navigate medical information with confidence and clarity.

What It Does

Our iOS app is an all-in-one solution for tracking medications seamlessly and understanding medical results and prescriptions.

Primary Features:

Augmented Reality Document Understanding:
Our app transforms complex medical documents into clear, accessible explanations for patients. Using Apple’s Vision Framework, we accurately detect text on lab reports, prescriptions, and other medical documents. This text is processed with generative AI, which provides easy-to-understand summaries of key medical information from different parts of the document. The summarized information is displayed in augmented reality (AR), showing users exactly where each piece of information comes from, improving transparency and comprehension. When the whole document is visible, a summarize button is displayed, and users can get an easy-to-understand summary of the entire document.
Medication Tracking:
Users can log their medication plans and track their intake using an intuitive calendar interface on the Home Page. For example, a user can create a plan to take Aspirin daily for 30 days and set an Apple Reminder to take the pill in the morning after breakfast. Users can log their intake throughout the day.
Alert System:
If at any point the user logs a greater intake than specified in their original plan, an alert will pop up on the Alerts Page. Similarly, if a user does not meet their required intake for a given day, an alert will indicate underdosage. For each alert, users can click on it to view information about the severity of the over/underdosage. These severity explanations are generated using OpenAI’s GPT-4 API.

Our app also includes functionality to translate all generated information into Spanish, enhancing accessibility.

By combining foundational medication tracking, OCR-powered text detection, generative AI summarization, and AR-enhanced visualization, MediScan bridges the gap between medical complexity and patient understanding, empowering users to take control of their healthcare like never before and increasing patient safety.

How We Built It

Frontend:

Swift for UI
XCode for the development environment

Backend:

Firebase for login authentication
Node.js and Express.js to run our server
MongoDB to store user and pill data

Terraform:

Terraform was an integral part of our tech stack, especially during rapid prototyping. Instead of manually setting up, updating, and hosting our code repeatedly, we set up Terraform actions integrated with platforms like AWS to handle CI/CD. The codified nature of our deployments allowed us to replicate and scale distributions easily, enabling us to run experimental branches concurrently and gather rapid user feedback for iteration.

Augmented Reality:

Optical Character Recognition (OCR) with Apple’s Vision Framework to detect and extract text from medical documents
VNImageRequestHandler for preprocessing images (rescaling, contrast enhancement, etc.) before running OCR, resulting in improved accuracy on lab reports, prescriptions, and medical records
Real-time text detection supporting multiple languages using Apple’s multilingual OCR model. Bounding boxes are extracted to map text locations for AR placement

Generative AI:

After extracting text with OCR, Generative AI models provide concise, patient-friendly explanations of complex medical terminology
Real-time translations using OpenAI’s GPT-4 API enhance linguistic accessibility, breaking language barriers in healthcare

Challenges Faced

Learning Swift: As a team, Swift was a relatively new programming language for us. This made it difficult to accurately scope out certain features and required extensive time spent on documentation. However, we embraced this as part of the learning experience, recognizing it was the ideal stack for our project.
XCode Development Issues: We encountered challenges related to signing issues, which prevented us from running the app on our individual devices. Since AR required real-time device testing, we built individual components separately using Playgrounds and other XCode projects before integrating them into a unified build.
Integration Complexity: Ensuring seamless integration of our individually developed components was challenging but manageable through strong communication, Notion templates, and written documentation.

Accomplishments That We’re Proud Of

Although we entered the Hackathon with a clear focus on interfaces, we aimed to create something not only novel and innovative but also practical, emphasizing usability for those who need it most. We’re particularly proud of overcoming key challenges, including:

Determining the exact position of targets in real-time 3D space and preventing drift:
We developed a solution that locks onto a specific “landmark” of the document, maintaining spatial awareness while in motion and significantly reducing data drop-offs.
Seamlessly integrating generative AI, language translation, embeddings, computer vision, and AR:
We combined several technologies into one cohesive experience, with much of the work done semi-asynchronously. Clear documentation and effective communication were vital to this success.

What We Learned

The hackathon was a transformative experience for our team. From the first hour, we were stepping out of our comfort zones, meeting at the team formation event without much to go on. By keeping an open mind, we discovered complementary skill sets and built a strong collaborative dynamic.

A major takeaway for us was learning to handle collaboration on a highly interconnected project. We designed systems to facilitate effective teamwork using Notion templates, documentation, and active communication, despite our differing backgrounds.

We’re proud of how we made it through the project with limited Swift knowledge, learned the necessary frameworks on the go, and embraced the hackathon’s challenge to push our limits and grow.

What’s Next for MediScan

Allergy-aware medication planning:
Automatically checking medications against a user’s allergy profile and medical history to prevent potential allergic reactions or adverse effects.
Medically verified alerts and explanations system:
Leveraging MongoDB’s vector database to store verified information about medications, symptoms, and medical documentation and enable retrieval. This data will be used for inferencing with generative AI, providing medically accurate explanations not only for alerts related to overdoses, underdoses, and allergy conflicts but also for interpreting and explaining key details in medical documents, test results, graphs, and charts.
Enhanced understanding of visual documentation:
Expanding capabilities to interpret and analyze graphs and charts from medical documents.
Support for more languages:
Broadening accessibility through additional language translations.
More complex information pipelines:
Improving data flow to better analyze and respond to user inputs with enhanced accuracy and efficiency.