docuvoice

Title Page
main page of the app

Inspiration

Many people struggle to understand essential documents (GOs, medical bills, legal notices) due to literacy challenges or language barriers. DocuVoice grants users independence by translating documents, explaining them simply, extracting key actions, and reading them aloud in their native tongue.

What it does

Multimodal Upload - Upload images or PDFs of any document
Smart Translation - Translate to 10+ Indian languages with simplified summaries
Voice Reader - High-quality TTS with custom speed controls
Action Items - Auto-extract tasks and deadlines as checklists
Official Document Detection - Extract GO Numbers, Department Names, and Dates
Location Grounding - Display addresses on interactive Google Maps

How we built it

Frontend - React 19, Tailwind CSS, Lucide React
AI Models - gemini-3-flash-preview for text extraction, gemini-2.5-flash-preview-tts for audio
Audio Engine - Custom Web Audio API decoder for raw PCM data
Maps - gemini-2.5-flash with Google Maps grounding
Deep Reasoning - Dynamic thinkingConfig with 2048 token budget for complex forms

Challenges we ran into

Raw Audio Decoding - Built custom TypeScript decoder for raw PCM data from Gemini TTS
JSON Reliability - Extensive prompt engineering for consistent structured outputs across RTL languages
Official Data Extraction - Used Gemini 3's reasoning to distinguish GO Numbers from generic text

Accomplishments that we're proud of

Native Indian Language Support - RTL layouts for Urdu/Arabic with wide Indic script support
Intent Extraction - Action items feature that goes beyond translation
Custom Audio Player - Variable playback speeds (0.5x-2.0x) with raw buffer handling