Inspiration

India is the "Pharmacy of the World," yet 98% of Adverse Drug Reactions (ADRs) go unreported. In rural India, the barriers are massive: low literacy, language fragmentation, and complex paper forms. A simple side effect can turn fatal if not reported, but the current systems require a computer and English proficiency.

We asked ourselves: What if reporting a medicine side effect was as easy as sending a WhatsApp voice note to a friend? This inspired Vani—a system built for the next billion users who are coming online via voice, not text.

What it does

Vani is a voice-native Pharmacovigilance Assistant that lives entirely on WhatsApp:

  1. Voice-to-Data: Patients simply speak in their local language (Hindi, Marathi, Tamil, etc.) describing their issue. Vani listens, transcribes, and translates it instantly.
  2. Multimodal Reporting: Users can send voice notes OR photos of medicine strips/prescriptions.
  3. Smart Triage: Our AI separates "meaningful noise" from "critical signals." It extracts:
    • Medicine Name (even from code-mixed "Hinglish")
    • Symptoms
    • Severity (calculating a risk score: Low/Medium/Critical)
  4. Interactive Follow-up: If a user says "I felt dizzy" but forgets to mention the medicine, Vani doesn't reject the report. It gently asks back in their language: "Which medicine did you take?"
  5. Zero-Touch Dashboard: For Pharma companies/Doctors, it autopopulates a structured E2B-compliant case report in a real-time dashboard.

How we built it

We architected Vani with a "Zero Cost, High Availability" philosophy, crucial for developing nations:

  • Hybrid STT Pipeline: We built a smart orchestrator that attempts to use Bhashini (Govt of India's API) for high-accuracy Indian language transcription. If that fails or is slow, it seamlessly falls back to a locally running constrained OpenAI Whisper instance. This ensures the system never drops a call.
  • Intelligent Fallback Architecture:
    • Primary Brain: We use Groq (Llama 3.1) for near-instant (<1s) JSON extraction and logic.
    • Backup Brain: If Groq hits a rate limit, the system hot-swaps to Google Gemini.
  • Lazy-Loading AI: To make this run on free-tier servers (like Render), we implemented lazy-loading for heavy models (Whisper/OCR). They only load into RAM when a request hits, keeping our baseline memory footprint minimal.
  • Custom "Hinglish" NLP: Standard models struggle with code-mixed speech (e.g., "Sir dard ho raha hai after taking Crocin"). We designed a specific prompting strategy that allows the LLM to parse this natural Indian speaking style perfectly.
  • OCR Pipeline: We integrated Tesseract OCR with an image preprocessing layer to read crumped medicine wrappers sent via low-res WhatsApp images.

Challenges we ran into

  • The "Marathi vs. Hindi" Identity Crisis: The standard Whisper model frequently misidentified Marathi voice notes as Hindi due to vocabulary overlap. We wrote a custom linguistic heuristic layer (_is_marathi_text) that checks for specific grammatical markers (like 'aahe', 'nako') to force-correct the language tag before processing.
  • LLM "JSON Fatigue": Getting consistent JSON output for database storage was tricky. Llama 3 would sometimes add conversational filler. We wrote a robust regex salvage layer that extracts valid JSON objects from broken responses so we don't waste tokens on retries.
  • Latency vs. Experience: Voice processing is heavy. To keep the WhatsApp "typing..." status from timing out, we moved the heavy lifting to background tasks (FastAPI BackgroundTasks) while sending an immediate "Listening..." acknowledgement to the user.

Accomplishments that we're proud of

  • The 30-Second Metric: We reduced the time to report an ADR from ~15 minutes (on standard portals) to under 30 seconds via voice.
  • Truly Multilingual: Specifically optimizing for "Hinglish"—the actual spoken language of urban India—rather than just "Textbook Hindi."
  • 100% Free Architecture: Proving that you can build a life-saving, production-grade AI system using entirely free-tier components (Groq, standard Whisper, Tesseract, MongoDB Atlas).

What we learned

  • UX > Tech: The most advanced AI is useless if the user feels intimidated. Moving the interface to WhatsApp (where users already are) increased "perceived usability" by 100x compared to a web app.
  • Resilience Engineering: Building for "free tier" means building for failure. Handling timeouts, rate limits, and cold starts gracefully was a masterclass in resilient backend engineering.

What's next for Vani

  • Offline-First Mode: Integrating SMS-based reporting for regions with 2G/No Internet.
  • Voice Biometrics: Using voice signatures to identify repeat reporters or authenticate doctors.
  • Regulatory Integration: Direct API pipes into the Indian Pharmacopoeia Commission's database for real-time national surveillance.

Built With

Share this project:

Updates