Sakhi

💡 Inspiration (The "Why")

India has over 300 million women who own smartphones but cannot use them effectively. Why? Because the digital world is built on text, but their reality is oral.

We saw our own mothers and grandmothers struggle—asking for help to scan a QR code, terrified of making a wrong click on a banking app, or unable to search for basic health information because they couldn't type in English.

We realized this isn't just a "UX problem"; it's a human rights crisis. The benefits of the digital economy—UPI payments, telemedicine, government schemes—are locked behind a "literacy wall."

We built Sakhi to tear that wall down. We wanted to create not just an app, but a friend (Sakhi) who is always there, speaks their language, and guides them with infinite patience.

🤖 What it does (The Solution)

Sakhi is a voice-first AI companion designed specifically for the next billion users. It completely bypasses the need for typing or complex navigation.

One-Touch Voice Interface: Users press a single large button to talk. No menus, no confusion.
Vernacular & Context-Aware: Powered by Google Gemini, Sakhi understands Hindi (and Hinglish) dialects, context, and emotion. She doesn't just answer; she explains like a patient teacher.
Interactive Learning Modules: Sakhi teaches essential life skills through voice-guided lessons:
- 🏥 Health: Menstrual hygiene, pregnancy care, and nutrition myths.
- 💰 Finance: How to use UPI safely, save money, and avoid scams.
- ⚖️ Rights: Legal awareness and government scheme eligibility.
Offline-Ready: Critical content works even in low-connectivity rural areas.

🛠️ How we built it (The Tech Stack)

We focused on building a "rugged" architecture that prioritizes accessibility and latency.

Frontend: We used React Native (Expo) to build a highly optimized, cross-platform mobile app. We implemented a custom VoiceVisualizer to give users immediate feedback that "Sakhi is listening."
The Brain (AI): We leveraged the multimodal capabilities of Google Gemini 1.5 Flash. Its speed and ability to handle vernacular reasoning were game-changers for us. We custom-prompted Gemini to adopt the persona of an empathetic Indian "Didi" (elder sister).
Voice Stack: We integrated expo-speech for Text-to-Speech (TTS) and native speech recognition modules to handle real-time audio processing.
Backend: Firebase handles authentication and real-time database needs, allowing us to sync user progress across simple, intermittent connections.

🧗 Challenges we ran into

The "Native" Hurdle: Implementing robust Speech-to-Text (STT) inside Expo Go was difficult due to native module restrictions. We had to architect a resilient fallback system that works seamlessly across development and production environments.
Latency vs. Empathy: LLMs can be slow. To keep the conversation natural, we optimized our prompt engineering to generate concise, warm responses and used a custom UI state machine (Listening -> Thinking -> Speaking) to keep the user engaged during processing.
Designing for the "Next Billion": Standard UI patterns (hamburger menus, small icons) failed during our initial user testing. We had to unlearn standard design and embrace "Neumorphism" with large, tactile buttons and high-contrast visuals.

🏆 Accomplishments that we're proud of

True Voice-First Experience: We achieved a flow where a user can navigate the entire core value loop without reading a single word.
Gemini Integration: Successfully taming a powerful LLM to speak simply, empathetically, and in a culturally resonant way.
The "Sakhi" Persona: Creating an avatar and interaction model that feels like a friend, not a bot. The feedback on the "Didi" avatar has been overwhelmingly positive.

🧠 What we learned

Empathy is a Tech Stack: For this demographic, the tone of the AI is as important as its intelligence. A correct answer delivered robotically is a failure; a helpful answer delivered warmly is a success.
Visuals Speak Louder: Combining voice with synchronized visuals (like the avatar speaking or relevant icons popping up) drastically improves comprehension for semi-literate users.

🚀 What's next for Sakhi

Multilingual Expansion: rolling out support for 9 major Indian languages (Tamil, Telugu, Bengali, etc.).
Hyper-Local Services: Connecting users to local ASHA workers or bank correspondents directly through voice command.
Image-to-Voice: Using Gemini Vision to let users snap a photo of a medicine strip or a government form and have Sakhi explain it to them.

Sakhi is more than code; it's a movement to give every woman her own voice in the digital world.

Built With

expo-go
firebase
gemini
react-native
speech-to-text
text-to-speech

Updates

Aayushi Goel started this project — Feb 23, 2026 02:07 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.