Inspiration
There are over 40 million Kurdish speakers globally, yet we often remain "digital second-class citizens." Major AI platforms frequently hallucinate our dialects or ignore them entirely. My inspiration was to bridge this divide. I wanted to build a single Multimodal SuperApp that brings the full power of Google's Gemini 3 ecosystem to my people, supporting both Sorani and the often-neglected Badini dialects.
What it does
Kurdish AI is not just a chatbot; it is a comprehensive productivity engine designed for a specific linguistic demographic. It listens, speaks, sees, and creates in Kurdish:
- 🎙️ Smart Voice Chat: Speaks and listens in Kurdish using Gemini 3's advanced reasoning and TTS capabilities.
- 🌍 Real-time Voice Translator: Acts as a live interpreter for travelers, translating spoken conversation instantly between Kurdish and 30+ languages.
- 📄 PDF to Word OCR: Uses Gemini 3's vision capabilities to digitize scanned Kurdish documents while preserving complex RTL layouts and tables.
- 🎥 Generative Media Studio: Creates images (Gemini 2.5) and Videos (Veo 3.1) directly from Kurdish text prompts.
- 💊 Medicine Guard: Analyzes medicine packaging to identify details and manufacturing origin, helping users verify pharmaceuticals in a region where counterfeit drugs are a concern.
How we built it
The core engine is built on React, Vite, and the @google/genai SDK.
- Advanced Reasoning (Gemini 3): We used
gemini-3-flash-previewwith extensive System Instructions. We injected a strict "Linguistic Ruleset" into every API call to enforce correct orthography for Sorani and Badini dialects. - Video Generation (Veo): We integrated Veo (
veo-3.1-fast-generate-preview). Since Veo is optimized for English prompts, we built a "Prompt Chain" where Gemini 3 first translates the user's Kurdish imagination into a detailed English cinematic prompt before feeding it to Veo. - Speech Architecture: We utilized
gemini-2.5-flash-preview-ttsfor high-quality audio generation, effectively giving the AI a native Kurdish voice. - Vision & OCR: We used
gemini-2.5-flash-imagefor editing/colorizing historical black-and-white Kurdish photos and Gemini 3 for analyzing complex visual data like PDFs.
Challenges we ran into
- Dialect Mixing: Gemini sometimes mixed Sorani and Badini vocabularies. We solved this by creating a dynamic context injector that feeds specific grammar rules (e.g., using "Ez" vs "Min") based on the user's selected dialect.
- RTL Formatting: Converting PDF to Word for Right-to-Left languages is notoriously difficult. We utilized Gemini 3 to analyze the layout structure and reconstructed the document using the
docxlibrary with specific BiDi (Bidirectional) flags.
Accomplishments that we're proud of
- Successfully integrating Veo for video generation in a Kurdish app for the first time.
- Building a Zero-Latency feel by optimizing audio streaming buffers.
- Creating a tool that preserves the Kurdish heritage through AI (colorizing old photos and digitizing books).
What's next for Kurdish AI
We plan to expand the dataset to include Zazaki and Hawrami dialects and introduce an offline mode for rural areas with poor internet connection.
Built With
- firebase
- gemini-2.5-flash-image
- gemini-2.5-tts
- gemini-3-flash
- react
- tailwindcss
- typescript
- veo-3.1
- vite
Log in or sign up for Devpost to join the conversation.