SAKHI AI

our main page
backend
the output with audio

Inspiration

Healthcare information should empower patients, yet medical reports are often dense, text-heavy, and written in technical language that many patients cannot understand. This problem is amplified in multilingual regions like India, where reports are usually provided in English, while patients may be more comfortable in regional languages. Our inspiration came from observing how patients and caregivers struggle to interpret diagnostic reports, leading to anxiety, delayed treatment, or incorrect medication usage. We realized that the issue was not a lack of medical data—but a lack of comprehensible explanation. This motivated us to build a system that could act as a compassionate intermediary between complex medical documents and patients

What it does

SAKHI AI automatically transforms medical reports into simple, patient-friendly explanations and delivers them as both text and multilingual audio. Patients can upload a medical report image or PDF, and SAKHI AI: Extracts text using OCR Simplifies complex medical content using a Large Language Model Converts the explanation into speech in multiple Indian languages Delivers the output directly via WhatsApp This enables patients—including low-literacy and non-English speakers—to understand their health information without relying on intermediaries.

How we built it

SAKHI AI is designed as an end-to-end AI pipeline: OCR Layer – Extracts text from scanned medical reports and images. LLM Processing Layer – A Large Language Model restructures and simplifies the extracted content into clear, patient-oriented summaries while preserving medical accuracy. Multilingual TTS Layer – The simplified text is converted into speech using Indic multilingual text-to-speech models. Delivery Layer – WhatsApp integration via Twilio ensures seamless and familiar access for users without requiring additional apps. This modular design allows each component to be improved or replaced independently.

Challenges we ran into

Handling noisy OCR outputs from low-quality scanned reports Balancing simplification with medical accuracy when using LLMs Ensuring multilingual clarity, especially for medical terms across Indian languages Latency and orchestration, coordinating multiple AI components in real time Ethical considerations, ensuring the system explains reports without making diagnoses or clinical decisions

Accomplishments that we're proud of

Built a fully functional end-to-end prototype within hackathon constraints Successfully generated multilingual audio explanations for real medical reports Designed a patient-centric system that prioritizes accessibility over complexity Integrated OCR, LLMs, TTS, and WhatsApp into a single, seamless workflow Positioned the project as medical explainability, not automated diagnosis

What we learned

Accessibility is as important as accuracy in healthcare AI LLMs are powerful tools for explanation, not just prediction Multimodal AI systems require careful orchestration and error handling Designing for real users means meeting them where they already are—like WhatsApp Ethical boundaries are critical when working with medical data

What's next for SAKHI AI

Support for more Indian languages and dialects Personalization based on patient age, literacy level, and preferences Integration with hospital and lab information systems Adding voice-based interaction for follow-up questions Clinical validation studies to evaluate impact on patient understanding and adherence