Inspiration

India runs on forms. Ration cards, bank accounts, insurance claims, hospital intake, government welfare schemes - hundreds of millions every year. Almost all on paper.

The people who need these services most are often the least able to fill the forms. A 65-year-old farmer applying for PM-KISAN who reads Marathi but the form is in English. A first-generation bank customer who has never typed into a web form.

The gap isn't technology. Everyone can have a conversation. Not everyone can fill a form.

That was the spark - what if filling a form felt like telling someone your details over the phone? No fields, no formatting rules, no "DD/MM/YYYY". Just talk.


What it does

Vaarta has two sides:

For the agent (bank branch, NGO, CSC operator, hospital):

  • Upload any form - digital PDF (AcroForm), scanned PDF, or image
  • AI automatically extracts every field: names, types, bounding boxes, validation rules
  • Visual editor to drag-reposition fields and preview exactly how the filled form will look - live sample data rendered on canvas in real time
  • Share via chat link, QR code, or WhatsApp link

For the user (form filler):

  • Open the link - no app download, no login
  • Talk naturally in English, Marathi, Hindi, Hinglish, Tamil, Telugu, or Gujarati - by text or voice
  • Vaarta guides them through the form one question at a time, in their language
  • Enter phone number → receive the filled PDF directly on WhatsApp

WhatsApp is the last-mile delivery channel. The completed PDF arrives where users already live - no email, no portal, no file management.


How I built it

Field Extraction

  • Digital AcroForm PDF: PyMuPDF reads all widget fields across every page, normalized to a 0-1000 coordinate space. ~100% reliable.
  • Scanned / image PDF: Page rendered and sent to Claude Vision (claude-sonnet-4-20250514). Returns field names, semantic labels, types, bounding boxes, and validation rules. Accuracy: 50–70% depending on form quality - agents use the visual editor to correct the rest.

Visual Field Editor

Built entirely in the browser - zero server round-trips for preview:

  • Drag-and-reposition bounding boxes on the actual form image
  • HTML5 Canvas live preview - renders the real form with sample Indian data in real time, respecting font, size, alignment, and overflow behaviour
  • Overflow detection flags fields where text won't fit
  • Per-field styling: font family, size (6-24pt), Bold/Italic, H/V alignment, ink colour
  • Undo/Redo with 30-step history, keyboard shortcuts ⌘Z / ⌘⇧Z
  • Auto-save after 30 seconds of idle edits

Chat Engine

Powered by GPT-4o with tool-calling (update_form_fields):

  • Each turn: user message + full form schema + collected values so far → GPT-4o → tool call → validated key-value pairs stored
  • Smart inference: "Rahul Kumar Sharma" fills first_name, middle_name, last_name simultaneously
  • Indian document validation in system prompt: PAN, Aadhaar, GSTIN, IFSC, TAN, mobile, pincode, etc
  • Language auto-detected per turn - replies naturally in Hindi, Hinglish, Tamil, Telugu, Bengali, Gujarati
  • Voice input via Web Speech API (en-IN / hi-IN) - browser-native, no external STT service
  • Session resume via ?session=... URL - full chat history and collected values restored

Fill-back

  • AcroForm: PyMuPDF sets all widget values across every page - true digital fill
  • Scanned forms: Text overlaid at bounding box coordinates, font scaled to image resolution, checkboxes render ✓/x, exported as clean PDF
  • Partial fill highlights unfilled required fields in yellow

WhatsApp PDF Delivery

User enters mobile number → Vaarta fills the PDF → sends via Twilio WhatsApp Business API. A dedicated endpoint serves the file as Twilio's media URL - no S3 or CDN required.

Form Health Score

Automated quality check across 5 dimensions: field clarity, required ratio, type variety, confusion risk, estimated completion time. Outputs a 0-100 score, grade A-F, with actionable suggestions - before the agent shares the form to users.

Analytics

Field-level drop-off funnel, completion rate, average time, language distribution, CSV export of all collected data.


Challenges we ran into

Bounding box accuracy on scanned forms - Claude Vision's coordinates aren't pixel-perfect. Solved with a visual drag-to-correct editor and a 0–1000 normalized coordinate system that works at any image resolution.

Stateless AI across multi-turn conversation - Each GPT-4o call has no memory. We maintain full chat_history per session and inject current collected state into every system prompt.

Multilingual document extraction - "Mera Aadhaar 1234 5678 9012 hai" requires parsing a mixed-language sentence, extracting a structured value, and validating a 12-digit format simultaneously. All Indian document rules are encoded in the system prompt.

Fill-back on image PDFs - Scanned forms are images, not fillable PDFs. Text overlaid at bounding box coordinates, font scaled to image resolution (794px baseline = 96 DPI A4), with special rendering for checkboxes and signature fields.

WhatsApp PDF delivery without cloud storage - Twilio requires a public media URL. We expose a dedicated endpoint that serves the filled PDF directly - works with any public deployment or ngrok in dev.


Accomplishments that we're proud of

  • The full loop works - upload a real scanned form, extract fields, fill via Hindi voice chat, receive the completed PDF on WhatsApp. End to end in under 5 minutes.
  • Live canvas preview - form editor renders filled output client-side with zero server calls, giving agents an exact print preview before sharing
  • Bilingual chat - not translated UI strings, but a model that reasons and responds naturally in Hindi, handles Hinglish, and validates Indian document formats inline
  • Form health scoring - automated feedback that tells agents their form has problems before users see it
  • Solo full-stack in - backend (Python/FastAPI), frontend (Next.js/TypeScript), AI (Claude + GPT-4o), PDF processing, WhatsApp delivery, full design system

What we learned

Prompt engineering is architecture. Field extraction and conversation quality depend almost entirely on how the system prompt is structured - schema injection, validation rules, language instructions, and tool definitions must work as one coherent context.

Coordinate systems matter more than you expect. Choosing 0-1000 normalized space early meant extraction, editor, and fill-back all speak the same language regardless of image resolution.

WhatsApp is infrastructure, not a feature. In India, WhatsApp is where documents live. Building delivery there - not email - is what makes the loop complete for real users.

AI accuracy needs human correction paths. Claude Vision at 50-70% on scanned forms isn't a failure - it's a starting point. The visual editor that lets agents correct the rest is what makes it trustworthy and shippable.


What's next for Vaarta - Just Talk, We'll Handle the Form

  • Multi-page preview and overlay fill - AcroForm fill already handles all pages; editor and image overlay are currently first-page only
  • WhatsApp inbound chat - allow the conversation itself to happen over WhatsApp (requires WABA approval)
  • Agent authentication - form ownership, tenant isolation, access control
  • On-device voice - Whisper model for STT in low-bandwidth / offline environments
  • Language expansion - Kannada, Odia, Punjabi
  • Form template library - pre-extracted common Indian government forms (PM-KISAN, Aadhaar correction, bank account opening)

Built With

Share this project:

Updates