Inspiration
Three years ago, I lost someone I deeply loved. Not because the right treatment didn't exist. Not because doctors weren't available. But because we couldn't navigate the fragmented healthcare system fast enough to reach the right specialist in time.
That moment changed everything for me. I realized that the barrier to healthcare isn't always medical - it's technological. My grandmother can't navigate hospital websites. My neighbor, who never learned to read, struggles to book appointments. An elderly friend in New York gave up trying to see a cardiologist because every clinic had a different portal, a different process, a different language.
Healthcare should be a human right, not a digital literacy test.
When we saw the Gemini Live Agent Challenge, we knew this was our moment. Google's multimodal AI could finally bridge the gap between people and the care they desperately need. We set out to build Appoint - not just an app, but a lifeline for millions who've been left behind by the digital healthcare revolution.
What it does
Appoint is your personal healthcare assistant that speaks your language, sees what you see, and removes every barrier between you and medical care.
Here's how it works:
1. Multimodal Symptom Understanding
- Speak naturally in your language (Bangla, English, or others)
- Show physical symptoms through your camera - rashes, swelling, injuries
- Appoint asks intelligent follow-up questions to understand your condition completely
- Powered by Gemini Live API, it processes voice, vision, and text simultaneously in real-time
2. Intelligent Doctor & Clinic Recommendations
- Analyzes your symptoms using Gemini's advanced reasoning
- Suggests the right type of specialist (ENT, cardiologist, dermatologist, etc.)
- Shows nearby clinics with real trade-offs: distance, availability, equipment, ratings
- No more guessing which hospital has the right facilities
3. Automated Appointment Booking
- You choose your preferred doctor and time
- Appoint navigates hospital portals, fills forms, and sends emails automatically
- Handles different interfaces seamlessly - no more juggling multiple systems
- Confirms your appointment and stores everything for future reference
4. Emergency Detection
- Recognizes critical symptoms like chest pain, stroke signs, or severe bleeding
- Immediately alerts you and shows the nearest emergency facility
- When seconds matter, Appoint acts instantly
5. Appointment Management
- Dashboard showing all past and upcoming appointments
- Reminders sent before each appointment
- Maintains conversation threads for follow-ups (e.g., "revisit in 3 months")
- Context-aware: remembers your medical history to make future bookings effortless
6. Multilingual Voice Support
- Seamless switching between languages mid-conversation
- Makes healthcare accessible to non-English speakers and the illiterate
- Natural, interruptible conversations - just like talking to a human assistant
How we built it
We built Appoint as a fully integrated system leveraging Google's cutting-edge AI and cloud infrastructure:
AI & Multimodal Intelligence (Member 1)
- Gemini Live API for real-time bidirectional voice streaming with interruption handling
- Gemini 2.0 Flash for multimodal understanding - processing voice, camera input, and text simultaneously
- Gemini Vision API for analyzing visual symptoms (skin conditions, injuries, swelling)
- Custom symptom analyzer that maps user descriptions to medical specialties
- Multilingual support using Gemini's native language capabilities (Bangla, English)
- Emergency detection system that identifies high-risk symptoms and triggers immediate alerts
- Intelligent automation agent using Playwright to navigate hospital portals, fill forms, and send booking emails
Frontend & User Experience (Member 2)
- Built with React and TypeScript for a responsive, accessible interface
- Real-time voice interface with visual feedback during conversations
- Camera integration for symptom visualization
- Interactive map showing clinic locations with filtering options
- Appointment dashboard with calendar view and notification system
- Rapid prototyping and testing with Google AI Studio for prompt optimization
- Designed for accessibility - large buttons, clear typography, voice-first interaction
- Mobile-responsive design ensuring usability on any device
Backend & Cloud Infrastructure (Member 3)
- Django backend for high-performance API endpoints
- Google Cloud Run for serverless deployment with automatic scaling
- Firestore for real-time data synchronization and appointment storage
- Google Cloud Storage for secure storage of user data and medical images
- RESTful API architecture connecting frontend, AI services, and automation agents
- Authentication and session management for secure user data
- Webhook integrations for appointment confirmations and reminders
- Orchestration layer coordinating between Gemini API, automation agents, and frontend
Architecture Flow:

Google Services Integration:
- Gemini Live API (real-time voice interaction)
- Gemini 2.0 Flash (multimodal AI reasoning)
- Gemini Vision API (visual symptom analysis)
- Google Cloud Run (serverless deployment)
- Firestore (database)
- Google Cloud Storage (file storage)
- Vertex AI (model deployment and management)
Challenges we ran into
Breaking the Text Box Paradigm The hardest challenge wasn't technical - it was philosophical. We had to completely rethink how humans interact with healthcare systems. Traditional chatbots feel robotic and frustrating. We needed Appoint to feel like a caring human assistant. Implementing Gemini Live API's bidirectional streaming with natural interruptions took multiple iterations. We had to handle edge cases where users would interrupt mid-sentence, change topics, or express emotions like fear or confusion.
Multimodal Coordination Synchronizing voice, vision, and text inputs in real-time was incredibly complex. When a user speaks while showing a symptom via camera, Gemini needs to process both simultaneously and generate coherent responses. We spent days fine-tuning the prompt engineering to ensure the AI understood context from multiple modalities without getting confused or hallucinating.
Automation Across Diverse Portals Every hospital has a different booking system. Some use modern web portals, others require email, and some still use phone-based systems. Building an automation agent that could intelligently navigate these varied interfaces using Playwright was like teaching a robot to adapt to chaos. We had to implement computer vision techniques where Gemini analyzes screenshots to understand UI elements dynamically.
Handling Medical Sensitivity We're not building a diagnostic tool - we're building a navigation assistant. Drawing that line clearly was crucial. We implemented strict guardrails to ensure Appoint never claims to diagnose or replace doctors. It recommends specialists and facilitates access, but always emphasizes the importance of professional medical advice.
Real-Time Performance on Cloud Deploying on Google Cloud Run while maintaining sub-second response times for voice interactions required careful optimization. We had to balance cold start times, memory allocation, and API rate limits. Implementing efficient caching strategies and connection pooling made the difference between a sluggish experience and a seamless one.
Multilingual Voice Recognition Getting Gemini to seamlessly switch between Bangla and English mid-conversation, while maintaining context and understanding medical terminology in both languages, required extensive testing and prompt refinement. We discovered that certain medical terms don't translate directly, so we built a custom terminology mapper.
Accomplishments that we're proud of
We built something that actually works. This isn't a concept - it's a functional system that can genuinely help people access healthcare today.
True multimodal interaction: Appoint doesn't just accept multiple input types - it understands them together. A user can say "my throat hurts" while showing their throat, and Appoint processes both to give better recommendations. This is the future of human-AI interaction.
End-to-end automation: We didn't stop at recommendations. Appoint actually books the appointment for you. Watching our automation agent successfully navigate diverse hospital portals and confirm real appointments felt like magic.
Accessibility first: We built for the people who need it most. Our grandmother test - could someone's grandmother use this without help? - guided every design decision. The result is an interface that feels invisible because it just works.
Sub-second voice responses: Achieving real-time, natural conversation with Gemini Live API on Google Cloud Run was a technical triumph. The latency is so low that interruptions feel natural, just like talking to a human.
Emergency detection that could save lives: Our system recognizes critical symptoms and responds with urgency. The first time we tested the chest pain scenario and saw Appoint immediately prioritize emergency care, we knew we'd built something meaningful.
Deployed on Google Cloud with zero downtime: Our backend runs smoothly on Cloud Run, scales automatically, and has handled every test we've thrown at it. The infrastructure is production-ready.
Multilingual support that actually understands context: Switching between Bangla and English mid-conversation while maintaining medical context isn't trivial. We made it seamless.
What we learned
Multimodal AI changes everything. Before this project, we thought of AI as text-in, text-out. Gemini Live API showed us that when AI can see, hear, and speak simultaneously, entirely new categories of problems become solvable. Healthcare navigation was impossible with traditional chatbots - but with multimodal interaction, it's natural.
Real-time matters more than we expected. The difference between a 2-second delay and a 200ms delay isn't just speed - it's the difference between feeling like you're talking to a machine versus talking to someone who cares. Optimizing for real-time taught us that user experience lives in the milliseconds.
Google Cloud Run is a game-changer for AI applications. Serverless deployment with automatic scaling meant we could focus on building features instead of managing infrastructure. The cold start optimizations we implemented taught us deep lessons about cloud architecture.
Automation is harder than it looks. Building an agent that can navigate arbitrary web interfaces required combining Gemini's vision capabilities with Playwright's automation. We learned that the future of UI interaction isn't about APIs - it's about AI that can see and understand interfaces like humans do.
Accessibility isn't a feature - it's a foundation. Designing for the elderly and illiterate forced us to question every assumption. We learned that the best interfaces are the ones you don't notice because they adapt to you, not the other way around.
Prompt engineering is an art. Getting Gemini to maintain context across voice, vision, and text while staying within medical guardrails required hundreds of iterations. We learned that the quality of AI output depends entirely on how well you communicate with it.
Healthcare is personal. Every test user shared stories about their struggles with the medical system. We learned that we're not just building software - we're building trust, dignity, and hope for people who've been failed by technology.
What's next for Appoint
Nationwide Hospital Network Integration Expand partnerships with healthcare providers for direct system integration, enabling even more reliable automation and seamless coordination across diverse healthcare networks.
Electronic Health Records Integration Connect with EHR systems (with user permission) so Appoint can consider medical history, allergies, and current medications when making recommendations, providing truly personalized healthcare navigation.
Family Account Management Enable users to manage appointments for elderly parents, children, or relatives who need assistance. One account, complete family healthcare coordination.
Global Expansion & More Languages Extend support to healthcare systems worldwide with additional language support, making Appoint a universal solution for healthcare accessibility barriers across all countries and cultures.
The ultimate goal: Make healthcare so accessible that technology becomes invisible. When someone feels sick, they shouldn't think about apps, portals, or forms. They should just talk to Appoint, and everything else happens automatically.
Three years ago, I couldn't help someone I loved. Today, with Gemini's multimodal capabilities and Google Cloud's infrastructure, we're building a future where no one has to face that helplessness again.
Appoint isn't just our hackathon project. It's our promise to make healthcare human again.
Built With
- css
- django
- firestore
- framer
- gemini-2.0-flash
- gemini-live-api
- gemini-vision-api
- google-ai-studio
- google-cloud
- google-cloud-run
- html
- playwright
- python
- react
- typescript
- vertex-ai
Log in or sign up for Devpost to join the conversation.