My Journey with Kural Kuriyitu: A Voice-First Coding Companion

What Inspired Me

Growing up in Kerala and seeing how many talented people around me — friends, family, students in rural areas — struggled with English-dominated tech tools, I became obsessed with the idea of democratizing coding. Typing in a foreign language, dealing with syntax while learning concepts, or even physical barriers like repetitive strain injuries — these exclude millions.

The real spark came from two directions in 2025–2026:

India's massive push for Indic voice technologies (Bhashini, VoiceERA, IndicVoices datasets, Sarvam AI's Bulbul TTS supporting Tamil and 10+ languages). I saw governments and startups finally treating regional languages as first-class citizens in AI.
Watching futuristic voice agents (Gemini Live, Claude voice mode) and realizing: why can't this power go to coding? Tools like Talon Voice existed for hands-free control, but nothing felt native for Indian developers who speak Tamil, Malayalam, or Hindi at home.

I wanted to build something personal: an app where someone in Tirunelveli or Kochi could just speak in Tamil — "ஒரு ரியாக்ட் லாகின் ஃபார்ம் உருவாக்கு" — and watch code appear, get explained aloud in their accent, and even get proactive fixes spoken back. No keyboard required. That vision of inclusion + magic drove every late night.

What I Learned

This project taught me far more than I expected:

Voice is hard, especially Indic. Web Speech API is decent for English but struggles with Tamil accents, code-specific terms ("useEffect", "flexDirection"), and code-mixed speech (Tamil + English keywords). Gemini Live API changed everything — native audio streaming, interruptions, better phoneme handling.
Accessibility is engineering + empathy. It's not enough to add TTS; you need speed control, accent choice, proactive spoken feedback, and graceful error recovery ("Sorry, I didn't catch that — can you repeat?").
Monaco + streaming is powerful but finicky. Suppressing diagnostics (like TS7027 unreachable code), applying streaming deltas without cursor jumps, highlighting lines during TTS — these small details make or break the "alive" feeling.
Client-side limits force creativity. No backend meant IndexedDB for projects, Pyodide for Python sandbox, careful Gemini prompt engineering for structured outputs (issues + fixes JSON).
India-first means multilingual everything. Prompts, confirmations ("ஆம்", "இல்லை"), TTS voices — testing with friends in Malayalam/Tamil revealed how much cultural nuance matters.

Most importantly: the future of tools isn't English-first with translation bolted on. It's native, voice-first, and culturally rooted.

How I Built It

Kural Kuriyitu ("Voice → Code" in Tamil) is 100% client-side:

Frontend: React 19 + TypeScript + Vite + Tailwind CSS + @monaco-editor/react
Editor & preview: Monaco for multi-file editing/navigation ("go to function login"), iframe sandbox for HTML/JSX preview, later added Web Worker JS execution + Pyodide for Python
Voice I/O:
- Started with Web Speech API (input) + SpeechSynthesis (TTS)
- Migrated to Gemini Live API (WebSocket audio streaming) for low-latency, interruptible, native-voice conversation
- MediaRecorder → 20–40 ms audio chunks → Gemini → PCM audio playback
AI core: @google/generative-ai → Gemini 2.5-flash (later Live variant)
- System prompt for code tasks + proactive analysis
- Structured outputs for fixes (JSON issues list)
- Session history + code context for multi-turn
Proactive loop: Idle timer (~5 s) → snapshot code → Gemini analysis → speak top issue → wait for voice confirmation → apply diff
Polish: Glassmorphism UI (backdrop-blur, cyan #00d4ff glows), live mic waveform, full-screen preview toggle, screen recorder, PWA basics
Localization: Tamil UI strings, voice commands, TTS rate/accent control

Built in phases over weeks: basic voice → streaming → multi-file → proactive → Live API → UI glow-up.

Challenges I Faced

Latency & barge-in: Early Web Speech + separate TTS felt robotic and lagged. Gemini Live solved most of it, but audio chunk handling + echo cancellation took debugging.
Code-specific speech recognition: "useReducer" → "use reducer" → wrong imports. Had to teach Gemini to correct transcriptions contextually.
Monaco quirks: Streaming deltas messed cursor/scroll; unreachable code red squiggles annoyed users → suppressed TS7027.
Tamil TTS quality: Native voices vary; fallback to adjustable SpeechSynthesis voices.
Sandbox security/performance: Pyodide load time + no-network worker restrictions.
Testing: Self-testing in English is easy; real Tamil/Malayalam feedback from friends revealed edge cases (code-mixing, dialect variation).

Every challenge was worth it — seeing someone speak in Tamil and get a working React component with spoken explanation felt like sci-fi becoming real.

Kural Kuriyitu isn't finished — it's a beginning. I hope it inspires more tools that let every Indian coder create without barriers.

Thank you for the journey. 🚀

என் குரலால் உருவாகும் குறியீடு — இது தொடக்கம் மட்டுமே!