Inspiration
The inspiration for HealthVision AI Assistant came from observing how language barriers and limited access to medical information affect healthcare outcomes globally. During the COVID-19 pandemic, we noticed many people struggling to understand medical advice in languages they weren't fluent in. We wanted to create a tool that could provide reliable, multilingual health guidance using cutting-edge AI, making medical information accessible to everyone regardless of their language or location.As participants in the Gemini 3 Hackathon, we were excited to leverage Google's latest multimodal AI capabilities to build something that could truly make a difference in people's lives, especially in underserved communities where language barriers often prevent access to quality healthcare information.
What it does
HealthVision AI Assistant is a comprehensive healthcare companion that provides:Multilingual Symptom Analysis: Users can describe symptoms in 5+ languages (English, Spanish, French, Arabic, Hindi) and receive AI-powered analysis of possible conditions, severity assessment, and recommendations. Visual Symptom Analysis: Users can upload images of rashes, injuries, or other visible symptoms for AI-powered visual analysis using Gemini's vision capabilities. Drug Interaction Checker: A safety feature that checks for potential interactions between multiple medications, considering the user's conditions and allergies. Voice I/O Support: Full voice input and text-to-speech output in multiple languages, making the tool accessible for users with visual impairments or those who prefer voice interaction. History Tracking & PDF Reports: Users can save their symptom history and generate comprehensive PDF reports to share with healthcare providers.The system uses Gemini 2.5 Flash for intelligent analysis and provides medically responsible guidance with clear disclaimers about consulting healthcare professionals.
How we built it
Frontend:HTML5/CSS3/JavaScript: Responsive web interface with modern CSS gradients and animations Font Awesome: For medical and UI icons SpeechSynthesis & SpeechRecognition APIs: For voice input/output functionality Backend:Node.js & Express: Server framework for API endpoints Google Generative AI SDK: Integration with Gemini 2.5 Flash and Gemini 2.0 Flash Exp Vision Multer: For handling image uploads CORS: For secure cross-origin requests Key Features: Language Processing: Implemented language-specific prompts and response parsing.Image Analysis: Base64 encoding and multimodal AI analysis Voice Optimization:Text preprocessing for better speech synthesis Error Handling:Robust fallback systems for API failures Session Management:User history tracking with UUID generation.
Challenges we ran into
Challenges we ran into Multilingual Voice Synthesis: Getting browser speech synthesis to work consistently across different languages and browsers was challenging. We had to implement fallback mechanisms and language-specific settings.JSON Response Parsing: Gemini sometimes returns malformed JSON or includes markdown formatting. We built a robust parsing system with multiple cleanup layers and fallback extraction methods.Image Analysis Limitations: While Gemini 2.0 Flash Exp Vision is powerful, it has specific requirements for image formatting and size. We implemented preprocessing and graceful fallbacks. Real-time Voice Recognition: Making voice recognition work reliably across different accents and background noise levels required careful configuration of the Web Speech API.Medical Accuracy & Safety: Ensuring the AI provides helpful but not misleading medical advice required careful prompt engineering and multiple safety disclaimers.
Accomplishments that we're proud of
Complete Multilingual Support: Successfully implementing a fully functional system that works seamlessly across 5 languages for both text and voice. Integrated Multimodal Approach: Creating a unified interface that handles text, voice, and image inputs effectively. Robust Error Handling: Building a system that gracefully handles API failures, network issues, and unexpected inputs.Hackathon-Ready Solution: Developing a complete, deployable application within the hackathon timeframe that addresses real-world healthcare accessibility challenges. User-Friendly Design: Creating an intuitive interface that makes advanced AI medical analysis accessible to non-technical users. Hackathon-Ready Solution:
What we learned
Gemini API Capabilities: Deep understanding of Gemini 2.5 Flash's capabilities and limitations, especially for medical applications. Multimodal AI Integration: How to effectively combine text, image, and voice processing in a single application. Internationalization Best Practices: Implementing truly multilingual applications requires more than just text translation - it needs cultural and linguistic adaptation. Medical AI Ethics: The importance of responsible AI development in healthcare, including clear disclaimers and encouraging professional consultation. Real-time Audio Processing: Technical insights into browser-based speech recognition and synthesis across different platforms.
What's next for healthvision-ai-assistant
Expand Language Support: Add support for 20+ languages including Mandarin, Russian, Portuguese, and more regional languages. Symptom Trend Analysis: Implement AI-powered analysis of symptom patterns over time to detect chronic conditions. Integration with Health APIs: Connect with wearable devices and health tracking apps for more comprehensive analysis. Doctor Connect Feature: Create a platform for connecting users with healthcare professionals for teleconsultation. Mobile App Development: Develop native iOS and Android applications for better mobile experience. Offline Mode: Implement local AI models for basic symptom analysis without internet connectivity. Clinical Validation: Partner with medical institutions to validate and improve the AI's diagnostic accuracy. Personal Health Records: Secure encrypted storage for personal medical history and integration with electronic health records.
Built With
- 2.0
- 2.5
- ai
- api
- control)
- cors
- cross-origin
- css3
- development
- environment)
- es6+)
- exp
- express.js
- file
- flash
- gemini
- generative
- git/github
- html5
- javascript
- management
- management)
- multer
- node.js
- npm
- package
- replit
- resource
- session
- speechrecognition)
- uploads)
- uuid
- version
- vision
- web-speech-api-(speechsynthesis
Log in or sign up for Devpost to join the conversation.