Inspiration

Millions in India struggle with forms daily. Government applications, banking documents, school admissions: these forms are English-only, visually complex, and inaccessible to non-English speakers and low-literacy users. India has 22 official languages and over 250 million adults with limited literacy. We asked: what if AI could make any form accessible to anyone, in their own language, using just their voice?

What it does

FormMitra converts any PDF form into a voice-first, multilingual interaction:

  1. Upload any PDF form (no templates needed)
  2. Vision AI detects all fillable fields automatically
  3. Speak your answers in 10+ Indian languages
  4. Download a filled PDF ready for submission

Any form works out of the box. No setup, no field mapping, no technical knowledge required.

How we built it

  • Gemini Flash (via OpenRouter) analyzes each PDF page and detects form fields with bounding box coordinates
  • Sarvam AI provides TTS (Bulbul) and STT (Saarika v2.5) for 10+ Indian languages including Hindi, Tamil, Telugu, Bengali, Kannada, and more
  • Databricks Platform: Unity Catalog Volumes for PDF storage, Delta Lake tables for submission logging and audit trails, SQL Warehouse for real-time analytics, Databricks Apps for deployment with built-in auth and secrets
  • React + FastAPI frontend with i18n support (4 UI languages), PyMuPDF for PDF rendering and text overlay

Challenges we ran into

  • Vision LLMs return imprecise coordinates, requiring robust normalization, padding, and degenerate box filtering
  • Dense forms exceeded token limits, causing truncated JSON. Built graceful partial-JSON recovery
  • Fixed font sizes don't work when field boxes range from tiny underlines to large text areas. Built adaptive font sizing
  • CPU-only constraint meant no local inference for a 72B model, pushing us to an API-based architecture
  • Chaining TTS, STT, and LLM cleaning across 10 languages introduced latency and edge cases with dialect variations

Accomplishments that we're proud of

  • Zero-config form detection: upload any PDF and it just works
  • 10+ Indian languages with voice input and multilingual UI
  • Full audit trail in Delta Lake with ACID guarantees
  • Production-ready: single app.yaml deployment on Databricks Apps with auth and secrets built in

What we learned

  • Databricks is a full-stack platform, not just a data tool. Volumes + Delta Lake + SQL Warehouse + Apps replaces what would be 5+ separate AWS services
  • Vision LLMs need heavy post-processing for spatial accuracy
  • Indian language AI (Sarvam) is genuinely production-ready now
  • Building for voice-first and multilingual from day one shaped every design choice positively

What's next for FormMitra

  • Handwriting recognition for partially completed forms
  • IndicTrans2 for offline translation without API dependency
  • Batch processing for government offices handling hundreds of forms
  • Form memory to remember common fields (name, Aadhaar) across sessions
  • Mobile PWA for low-end Android devices with intermittent connectivity

Built With

Share this project:

Updates