What Inspired Me
Medication non-adherence is a critical healthcare problem affecting millions worldwide. The complexity of managing multiple prescriptions—each with different schedules, dosages, and instructions—creates a significant burden for patients and caregivers.
I was inspired to build MedSnap after witnessing family members struggle with medication management. The traditional approach of manually entering medication details into tracking apps is tedious and error-prone. I envisioned a solution that would eliminate this friction entirely: simply point your phone at a medication label, and let AI do the rest.
What I Learned
Building MedSnap taught me the power of combining multimodal AI capabilities with native iOS frameworks. I discovered that Gemini 3 Flash excels at structured data extraction from unstructured text, making it perfect for parsing complex medication instructions that vary widely in format and terminology.
Key learnings:
Gemini 3's JSON Schema Mode: I leveraged Gemini 3's structured output capabilities with strict JSON schema validation to ensure reliable extraction of medication fields (name, dosage, frequency, instructions, etc.). This eliminated the need for complex regex parsing or brittle text processing.
Multimodal Pipeline Design: I learned to architect a two-stage pipeline: Apple Vision Framework for OCR (running locally on-device for privacy) followed by Gemini 3 for intelligent parsing. This separation of concerns allowed me to optimize each stage independently.
Natural Language Understanding: Gemini 3's ability to understand context and infer meaning was crucial. It correctly interprets varied instructions like "twice daily with meals," "every 8 hours," "take 1 tablet in the morning and 1 at bedtime," and converts them into structured schedules.
iOS Integration Patterns: I mastered SwiftData for local persistence, UserNotifications for reminders, and SwiftUI's declarative patterns for building a responsive, modern interface.
How I Built It
MedSnap is built as a native iOS application using a clean, modular architecture:
1. Vision Layer (On-Device OCR)
- Apple Vision Framework (
VNRecognizeTextRequest) extracts text from medication label photos - Runs entirely on-device for privacy and speed
- Handles various image qualities and lighting conditions
2. AI Extraction Layer (Gemini 3)
- OCR text is sent to Gemini 3 Flash model
- Uses JSON Schema mode with strict validation for reliable structured output
- Extracts: medication name, dosage, frequency, instructions, times per day, quantity, duration
- Intelligently calculates treatment duration based on quantity and frequency
3. Data Layer
- SwiftData models for
MedicationandReminderItem - All data stored locally—no cloud sync, maximum privacy
- Relationships between medications and reminders with cascade deletion
4. Reminder System
ReminderManagerautomatically generates reminder schedules based on extracted frequency- Supports daily, twice-daily, three-times-daily, and custom schedules
NotificationServicehandles iOS notifications with action buttons (Taken/Skip)- Visual calendar view with color-coded medication schedules
5. User Interface
- SwiftUI-based tabbed interface (Medications, Calendar, Reminders, Settings)
- Camera integration for label capture
- Medication detail views with drug information links to Drugs.com for educational purposes
- Intuitive reminder management with status tracking
Gemini 3 Integration Details:
- Model:
google/gemini-3-flash-preview - Feature: Structured JSON output with schema validation
- Temperature: 0.3 (for consistent, reliable extraction)
- Prompt engineering: System prompt defines the assistant role, user prompt includes OCR text and extraction rules
- Response format: Strict JSON schema ensuring all required fields are present
Challenges I Faced
1. Parsing Complex Medication Instructions Medication labels use highly variable language: "twice daily," "BID," "every 12 hours," "with breakfast and dinner," etc. I solved this by crafting detailed prompts that guide Gemini 3 to recognize patterns and normalize them into structured fields. The JSON schema ensures I always get consistent output.
2. Structured Output Reliability Early versions sometimes returned malformed JSON or missing fields. I implemented strict JSON schema validation in Gemini 3's response format, which guarantees valid, complete responses. I also added fallback handling for edge cases.
3. Privacy and Security I wanted to keep user data private while using cloud-based AI. My solution: OCR runs on-device (Apple Vision), only text (not images) is sent to Gemini 3, and all medication data stays local. API keys are obfuscated in the binary.
4. Reminder Scheduling Logic
Converting natural language frequencies ("twice daily with meals") into precise reminder times required careful logic. I built ReminderManager to handle various frequency patterns and allow users to customize times while preserving the extracted schedule.
5. iOS Notification Handling Ensuring notifications work reliably on physical devices required careful attention to notification delegate setup and lifecycle management. I implemented proper app delegate integration and badge management.
Impact and Future Vision
MedSnap addresses a real-world problem affecting millions of people. By eliminating the friction of manual medication entry, I make it easier for patients to adhere to their treatment plans. The app is particularly valuable for:
- Elderly patients managing multiple prescriptions
- Caregivers tracking family members' medications
- Anyone with complex medication schedules
Future enhancements could include medication interaction warnings, refill reminders, Health app integration, and adherence visualization.
Log in or sign up for Devpost to join the conversation.