π‘ Inspiration
We live in the "Era of AI," where technology is moving at light speed. Yet, in South Asia alone, 244 million people remain in the dark. These are our mothers, our village elders, and our daily wage laborers who own smartphones but cannot read or write.
For them, the internet isn't a toolβit's a locked door.
- They can't read medicine labels, leading to dangerous health risks.
- They fall victim to SMS phishing scams because they can't distinguish between a bank alert and a fraud attempt.
- They rely on others for basic tasks like reading utility bills or farming instructions.
We asked ourselves: What if AI could be their eyes and voice? Saathi AI (meaning "Companion") was born from the desire to build a bridge over this digital divide using the power of Google Gemini 3.
π± What it does
Saathi AI is a Voice-First Operating System designed specifically for the illiterate. It replaces complex text menus with a simple, 4-button interface that uses Multimodal AI (Vision & Voice) to solve daily survival problems.
- π Health Mode (Sehat): A user snaps a photo of a medicine strip. The AI identifies the drug, dosage, and purpose, and explains it in simple local audio (Urdu/Punjabi).
- π‘οΈ Fraud Shield (Hifazat): The user uploads a screenshot of a suspicious SMS. Gemini uses its Reasoning capabilities to analyze the sender and content, instantly warning the user if it's a scam.
- πΎ Farming Mode (Kisan): Farmers can scan crops to identify diseases like Wheat Rust and get audio-based treatment advice.
- ποΈ Read for Me: The AI acts as a universal translator, turning any document, bill, or notice into a simple spoken summary.
βοΈ How we built it
We built Saathi AI with a focus on speed and accessibility.
- The Brain: We utilized Google Gemini 3 (and Gemini 2.0 Flash) via the API. Its Multimodal capabilities were essential for processing images (OCR + Object Detection), while its Reasoning engine was critical for the Fraud Check module.
- The Interface: We used Streamlit (Python) to build a responsive web app that mimics a native mobile experience. We removed all text input fields, replacing them with large Camera and Microphone triggers.
- The Voice: We integrated the
gTTS(Google Text-to-Speech) library to convert Gemini's text responses into distinct, human-like Urdu audio files that play automatically. - Prompt Engineering: We spent significant time refining system instructions to ensure Gemini speaks in "Roman Urdu" (conversational style) rather than complex literary Urdu, making it intelligible to uneducated villagers.
π§ Challenges we faced
- The "No-Text" Constraint: Designing a UI for someone who cannot read was incredibly difficult. We had to resist the urge to add "Help" text or "Settings" menus. Every feature had to be intuitive through iconography alone.
- Latency vs. Engagement: Illiterate users are impatient with technology they don't understand. Waiting 5-6 seconds for an audio response felt too long. We optimized the pipeline by using Gemini 2.0 Flash for faster inference and asynchronous audio generation.
- Dialect Nuances: Getting the AI to sound like a "friend" (Saathi) and not a "robot" required extensive prompt tuning. We had to instruct the model to use empathetic phrases like "Amma ji" (Mother) or "Bhai" (Brother) to build trust.
π What we learned
We learned that Gemini 3 is not just for coding assistants or data analysis; it is a tool for social empowerment. The model's ability to "see" an image and "reason" about safety (e.g., detecting a scam SMS) is a game-changer for digital safety. We also realized that the "Next Billion Users" don't need more features; they need accessible features.
π What's next for Saathi AI
- Gemini Nano Integration: To make this work in deep rural areas with poor internet, we plan to move the core logic to on-device processing using Gemini Nano.
- Voice-to-Action: Enabling the user to perform tasks (like sending money or booking a ride) entirely through voice commands.
- Regional Expansion: Adding support for Sindhi, Pashto, and Saraiki dialects to cover all of Pakistan.
Built With
- api
- google-gemini-2.5-flash-(tts)
- google-gemini-3-flash
- google-gemini-3-pro
- google-search-grounding
- mediastream
- pollinations-ai
- react
- tailwind-css
- typescript
- web-audio-api


Log in or sign up for Devpost to join the conversation.