Inspiration
India runs on forms. Ration cards, bank accounts, insurance claims, hospital intake, government welfare schemes - hundreds of millions every year. Almost all on paper.
The people who need these services most are often the least able to fill the forms. A 65-year-old farmer applying for PM-KISAN who reads Marathi but the form is in English. A first-generation bank customer who has never typed into a web form.
The gap isn't technology. Everyone can have a conversation. Not everyone can fill a form.
That was the spark - what if filling a form felt like telling someone your details over the phone? No fields, no formatting rules, no "DD/MM/YYYY". Just talk.
What it does
Vaarta has two sides:
For the agent (bank branch, NGO, CSC operator, hospital):
- Upload any form - digital PDF (AcroForm), scanned PDF, or image
- AI automatically extracts every field: names, types, bounding boxes, validation rules
- Visual editor to drag-reposition fields and preview exactly how the filled form will look - live sample data rendered on canvas in real time
- Share via chat link, QR code, or WhatsApp link
For the user (form filler):
- Open the link - no app download, no login
- Talk naturally in English, Marathi, Hindi, Hinglish, Tamil, Telugu, or Gujarati - by text or voice
- Vaarta guides them through the form one question at a time, in their language
- Enter phone number → receive the filled PDF directly on WhatsApp
WhatsApp is the last-mile delivery channel. The completed PDF arrives where users already live - no email, no portal, no file management.
How I built it
Field Extraction
- Digital AcroForm PDF: PyMuPDF reads all widget fields across every page, normalized to a 0-1000 coordinate space. ~100% reliable.
- Scanned / image PDF: Page rendered and sent to Claude Vision (
claude-sonnet-4-20250514). Returns field names, semantic labels, types, bounding boxes, and validation rules. Accuracy: 50–70% depending on form quality - agents use the visual editor to correct the rest.
Visual Field Editor
Built entirely in the browser - zero server round-trips for preview:
- Drag-and-reposition bounding boxes on the actual form image
- HTML5 Canvas live preview - renders the real form with sample Indian data in real time, respecting font, size, alignment, and overflow behaviour
- Overflow detection flags fields where text won't fit
- Per-field styling: font family, size (6-24pt), Bold/Italic, H/V alignment, ink colour
- Undo/Redo with 30-step history, keyboard shortcuts
⌘Z / ⌘⇧Z - Auto-save after 30 seconds of idle edits
Chat Engine
Powered by GPT-4o with tool-calling (update_form_fields):
- Each turn: user message + full form schema + collected values so far → GPT-4o → tool call → validated key-value pairs stored
- Smart inference: "Rahul Kumar Sharma" fills
first_name,middle_name,last_namesimultaneously - Indian document validation in system prompt: PAN, Aadhaar, GSTIN, IFSC, TAN, mobile, pincode, etc
- Language auto-detected per turn - replies naturally in Hindi, Hinglish, Tamil, Telugu, Bengali, Gujarati
- Voice input via Web Speech API (en-IN / hi-IN) - browser-native, no external STT service
- Session resume via
?session=...URL - full chat history and collected values restored
Fill-back
- AcroForm: PyMuPDF sets all widget values across every page - true digital fill
- Scanned forms: Text overlaid at bounding box coordinates, font scaled to image resolution, checkboxes render ✓/x, exported as clean PDF
- Partial fill highlights unfilled required fields in yellow
WhatsApp PDF Delivery
User enters mobile number → Vaarta fills the PDF → sends via Twilio WhatsApp Business API. A dedicated endpoint serves the file as Twilio's media URL - no S3 or CDN required.
Form Health Score
Automated quality check across 5 dimensions: field clarity, required ratio, type variety, confusion risk, estimated completion time. Outputs a 0-100 score, grade A-F, with actionable suggestions - before the agent shares the form to users.
Analytics
Field-level drop-off funnel, completion rate, average time, language distribution, CSV export of all collected data.
Challenges we ran into
Bounding box accuracy on scanned forms - Claude Vision's coordinates aren't pixel-perfect. Solved with a visual drag-to-correct editor and a 0–1000 normalized coordinate system that works at any image resolution.
Stateless AI across multi-turn conversation - Each GPT-4o call has no memory. We maintain full chat_history per session and inject current collected state into every system prompt.
Multilingual document extraction - "Mera Aadhaar 1234 5678 9012 hai" requires parsing a mixed-language sentence, extracting a structured value, and validating a 12-digit format simultaneously. All Indian document rules are encoded in the system prompt.
Fill-back on image PDFs - Scanned forms are images, not fillable PDFs. Text overlaid at bounding box coordinates, font scaled to image resolution (794px baseline = 96 DPI A4), with special rendering for checkboxes and signature fields.
WhatsApp PDF delivery without cloud storage - Twilio requires a public media URL. We expose a dedicated endpoint that serves the filled PDF directly - works with any public deployment or ngrok in dev.
Accomplishments that we're proud of
- The full loop works - upload a real scanned form, extract fields, fill via Hindi voice chat, receive the completed PDF on WhatsApp. End to end in under 5 minutes.
- Live canvas preview - form editor renders filled output client-side with zero server calls, giving agents an exact print preview before sharing
- Bilingual chat - not translated UI strings, but a model that reasons and responds naturally in Hindi, handles Hinglish, and validates Indian document formats inline
- Form health scoring - automated feedback that tells agents their form has problems before users see it
- Solo full-stack in - backend (Python/FastAPI), frontend (Next.js/TypeScript), AI (Claude + GPT-4o), PDF processing, WhatsApp delivery, full design system
What we learned
Prompt engineering is architecture. Field extraction and conversation quality depend almost entirely on how the system prompt is structured - schema injection, validation rules, language instructions, and tool definitions must work as one coherent context.
Coordinate systems matter more than you expect. Choosing 0-1000 normalized space early meant extraction, editor, and fill-back all speak the same language regardless of image resolution.
WhatsApp is infrastructure, not a feature. In India, WhatsApp is where documents live. Building delivery there - not email - is what makes the loop complete for real users.
AI accuracy needs human correction paths. Claude Vision at 50-70% on scanned forms isn't a failure - it's a starting point. The visual editor that lets agents correct the rest is what makes it trustworthy and shippable.
What's next for Vaarta - Just Talk, We'll Handle the Form
- Multi-page preview and overlay fill - AcroForm fill already handles all pages; editor and image overlay are currently first-page only
- WhatsApp inbound chat - allow the conversation itself to happen over WhatsApp (requires WABA approval)
- Agent authentication - form ownership, tenant isolation, access control
- On-device voice - Whisper model for STT in low-bandwidth / offline environments
- Language expansion - Kannada, Odia, Punjabi
- Form template library - pre-extracted common Indian government forms (PM-KISAN, Aadhaar correction, bank account opening)

Log in or sign up for Devpost to join the conversation.