Inspiration
Access to timely medical expertise remains a critical challenge worldwide. Emergency departments are overwhelmed, rural areas lack specialists, and patients often struggle to articulate complex symptoms. We envisioned a tool that could bridge this gap by combining multiple data sources - medical images, voice descriptions, and patient history - into comprehensive preliminary assessments.
MediSense AI was born from the vision of democratizing medical triage, making sophisticated diagnostic reasoning accessible to healthcare providers in underserved areas and empowering patients with preliminary insights before seeking specialist care.
What it does
MediSense AI is a comprehensive multimodal medical diagnostic assistant that leverages Gemini 3 Pro's advanced reasoning capabilities to provide evidence-based preliminary health assessments.
Core Features:
1. Multimodal Input Processing
- Upload medical images (X-rays, ultrasounds, CT scans, lab reports)
- Capture live photos using device camera
- Record voice descriptions of symptoms and medical history
- Manual patient data entry with comprehensive forms
2. AI-Powered Analysis
- Gemini 3 Pro analyzes all inputs simultaneously
- Generates detailed clinical reasoning reports
- Identifies potential conditions with differential diagnosis
- Provides urgency-level triage (Low, Medium, High, Emergency)
- Extracts structured data from voice recordings
3. Comprehensive Reporting
- Professional medical reports with clear visual hierarchy
- Evidence-based recommendations
- Red flag identification for critical symptoms
- Grounded in clinical literature and guidelines
- Exportable and shareable reports
4. Secure Cloud Storage
- Firebase-powered user authentication
- Encrypted report storage
- Complete medical history tracking
- Configurable data retention policies
- HIPAA-compliant design principles
5. Modern User Experience
- Dark/Light theme support
- Responsive design for all devices
- Real-time auto-save
- Offline-capable PWA architecture
- Accessibility-first design
How we built it
Technology Stack:
Frontend:
- React 18 with TypeScript for type-safe development
- Vite for lightning-fast builds and HMR
- TailwindCSS for responsive, utility-first styling
- Framer Motion for smooth animations
- React Router for client-side navigation
AI & ML:
- Google Gemini 3 Pro API for multimodal analysis
- Custom prompt engineering for medical reasoning
- Grounding with Google Search for evidence-based results
- Audio transcription and structured data extraction
Backend & Infrastructure:
- Firebase Authentication (Google OAuth)
- Cloud Firestore for document storage
- Firebase Storage for media files
- Real-time data synchronization
Key Libraries:
- react-markdown + remark-gfm for report rendering
- html2canvas + jspdf for PDF generation
- qrcode.react for report sharing
- lucide-react for consistent iconography
Architecture Highlights:
- Modular Component Design: Each feature is isolated into reusable components
- Context API: Global state management for auth and theming
- Progressive Web App: Offline-first with service workers
- Responsive Design: Mobile-first approach with breakpoint optimization
- Type Safety: Full TypeScript coverage for runtime reliability
Challenges we ran into
1. Multimodal Data Coordination
Challenge: Synchronizing image uploads, voice transcription, and form data while maintaining real-time feedback.
Solution: Implemented a queue-based processing system with optimistic UI updates and comprehensive error handling.
2. Medical Report Formatting
Challenge: Gemini's markdown output lacked visual hierarchy, making clinical reports hard to scan.
Solution: Developed custom ReactMarkdown components with intelligent status badge detection, table styling, and section highlighting.
3. Voice-to-Structured Data
Challenge: Extracting precise patient data (age, gender, symptoms) from natural speech.
Solution: Engineered specific prompts requesting JSON-formatted responses and implemented robust parsing with fallback mechanisms.
4. Dark Mode Consistency
Challenge: Ensuring readability across light/dark themes, especially in medical reports with critical information.
Solution: Created a comprehensive color system with proper contrast ratios, separate styling for report sections, and theme-aware status badges.
5. Real-Time Camera Integration
Challenge: Supporting both mobile and desktop cameras with different resolutions and orientations.
Solution: Implemented adaptive MediaStream handling with orientation detection and quality optimization.
6. Firebase Rate Limiting
Challenge: Firestore read/write limits during high-frequency auto-save operations.
Solution: Debounced save operations and implemented local state caching with periodic synchronization.
Accomplishments that we're proud of
✅ Seamless Multimodal Integration - Successfully combined image, voice, and text analysis into a unified diagnostic workflow
✅ Clinical-Grade Reports - Achieved professional medical documentation quality with clear visual hierarchy and evidence-based reasoning
✅ Voice Intelligence - Implemented accurate speech-to-structured-data conversion, extracting precise medical information from natural language
✅ Real-Time Processing - Sub-10-second analysis times for complex multimodal inputs
✅ Accessibility First - WCAG 2.1 AA compliant with keyboard navigation, screen reader support, and color-blind friendly design
✅ Production-Ready Security - Implemented proper authentication, data encryption, and privacy controls
✅ Polished UX - Created an intuitive, visually appealing interface that makes advanced AI feel approachable
What we learned
Technical Insights:
- Prompt engineering for medical contexts requires extreme precision and explicit formatting instructions
- Multimodal AI models excel when inputs complement each other (voice context + visual evidence)
- Type safety in medical applications prevents critical bugs at runtime
- Proper error boundaries are essential for production healthcare software
Product Insights:
- Medical professionals value scannable reports over comprehensive paragraphs
- Urgency-level triage is the most critical feature for real-world adoption
- Patients need disclaimers and clear guidance on when to seek in-person care
- Dark mode is essential for late-night clinical reviews
AI Model Insights:
- Gemini 3's grounding feature significantly improves clinical accuracy
- Structured output requests dramatically improve data extraction reliability
- Context window management is crucial for comprehensive medical histories
- Vision + text combinations outperform single-modality analysis
What's next for MediSense AI
Immediate Roadmap (Q1 2026):
Enhanced AI Capabilities:
- Multi-image comparative analysis (e.g., tracking tumor growth over time)
- Integration with medical databases (ICD-10, SNOMED CT)
- Specialized models for radiology, pathology, and dermatology
- Support for additional languages (Spanish, Hindi, Mandarin)
Clinical Features:
- Appointment scheduling integration
- Electronic Health Record (EHR) system compatibility
- Physician collaboration tools (second opinions, annotations)
- Treatment plan generation and medication suggestions
Enterprise Features:
- Multi-tenant architecture for hospitals and clinics
- Role-based access control (doctors, nurses, patients)
- Analytics dashboard for clinical insights
- Batch processing for research applications
Long-Term Vision:
Medical Device Certification:
- FDA 510(k) clearance pathway
- CE marking for European markets
- Clinical validation studies
Partnerships:
- Integration with telemedicine platforms
- Collaboration with medical universities for training datasets
- Partnerships with NGOs for deployment in underserved regions
Research Applications:
- Anonymized data for medical AI research
- Open-source model fine-tuning datasets
- Academic collaboration platform
Global Impact:
- Multi-language support for 50+ languages
- Low-bandwidth optimization for developing regions
- Offline-first mobile apps for areas with limited connectivity
- Free tier for humanitarian organizations
Built With
- cloud-firestore
- firebase
- framer-motion
- gemini-3-pro
- google-cloud
- markdown
- pwa
- react
- react-router
- responsive-design
- tailwindcss
- typescript
- vite
Log in or sign up for Devpost to join the conversation.