CiviGuide AI — Project Story

What Inspired Us

The idea came from a frustration that millions of Indians experience every single day — standing in a government office, form in hand, confused by language like "remunerative employment during the preceding fiscal year" when all they needed to know was "did you earn money last year?"

India processes over 400 million government applications annually — from Aadhaar enrollments and PAN card applications to ITR filings, passport renewals, and ration card updates. A significant portion of these get rejected or delayed not because the applicant was ineligible, but because they filled a field incorrectly, missed a document, or misunderstood a legal instruction.

Government forms in India are written by bureaucrats, for bureaucrats. But they are filled by everyone — farmers in rural Maharashtra, elderly pensioners in Tamil Nadu, first-generation graduates in Bihar applying for their first job, small shop owners in UP who have never seen a legal document in their life.

The linguistic barrier makes this worse. Forms are in English. The majority of India is not.

We asked ourselves: what if the form adapted to the person, instead of the person adapting to the form?

That question became CiviGuide AI.


What We Built

CiviGuide AI is an intelligent government form navigation system built on a 6-engine AI architecture, each engine handling a distinct layer of the problem — designed specifically for the Indian bureaucratic ecosystem, covering forms like ITR-1, Passport (DS-11), Aadhaar enrollment, driving license (RTO), and more.

The 6 Engines

Engine Role
Engine 1 — Legal Simplifier Converts legal field descriptions into plain human questions
Engine 2 — Entity Extractor Pulls structured data from natural language responses
Engine 3 — Field Mapper Maps extracted data to the correct form schema fields
Engine 4 — Logic Validator Cross-checks answers for contradictions and errors
Engine 5 — Document Recommender Determines required supporting documents from answers
Engine 6 — Risk Scorer Calculates submission confidence before final export

How the Risk Score Works

The submission confidence score $S$ is calculated as:

$$S = 100 - \sum_{i} w_i \cdot e_i$$

Where:

  • $e_i$ = number of issues of type $i$
  • $w_i$ = penalty weight for that issue type
Issue Type Weight $w_i$
Missing required field 15
Logical contradiction 30
Format error 15
Missing document 10

A score $S \geq 80$ indicates low risk, $55 \leq S < 80$ is medium risk, and $S < 55$ is high risk — meaning the application is likely to be rejected at a government counter.

Indian Forms We Support

  • ITR-1 — Income Tax Return for salaried individuals
  • Passport Application — Fresh and renewal (DS-11, DS-82)
  • Driving License — RTO application and renewal
  • Aadhaar Enrollment / Update — Address and demographic changes
  • PAN Card Application — Form 49A
  • Ration Card Application — State-specific forms
  • Any other government form via PDF/DOCX upload

Tech Stack

  • AI Layer — LLaMA 3.3 70B via Groq API (free, fast, works in India)
  • Backend — Python + Flask REST API with CORS support
  • Frontend — React with a custom dark government-portal aesthetic
  • Document I/O — PyPDF2 + python-docx for input, ReportLab for printable PDF output
  • Dev Environment — NixOS with direnv + shell.nix for reproducible builds

How We Built It

We split the project into three clean layers with clear boundaries:

AI Intelligence Layer (our core focus) — built as isolated, callable Python functions. Each engine is independent and testable on its own. The entire AI layer is exposed as a REST API that the frontend consumes.

Backend API — a lightweight Flask server with 7 endpoints covering the full pipeline: /upload, /analyze, /answer, /validate, /documents, /score, and /ask.

Frontend — a React chat interface that mirrors the conversational experience of talking to a knowledgeable assistant at a Jan Seva Kendra, with live engine status indicators, inline document checklists, and a visual risk scorer.

The key architectural decision was separation of concerns — the AI logic has zero knowledge of the frontend, and the frontend has zero knowledge of the AI. The backend is the only bridge. This made parallel development possible and the system easy to debug.


Challenges We Faced

1. Indian government forms are structurally inconsistent No two Indian government forms follow the same structure. ITR-1 is a structured table, Passport DS-11 is a mix of prose and fields, RTO forms vary by state. Getting Engine 1 to reliably extract a clean schema from any arbitrary form required significant prompt engineering and fallback handling.

2. Natural language is ambiguous — especially in Indian English When someone says "I earn around 20 thousand" — is that monthly or yearly? Is it salary or business income? Does it include agricultural income which is tax-exempt in India? Engine 2 had to detect uncertainty markers and resolve them through context or ask targeted follow-up questions rather than silently making assumptions.

3. India-specific validation rules are complex Indian tax law has dozens of interdependent rules — HRA exemptions, Section 80C deduction limits of ₹1.5 lakh, different ITR forms for different income types, agricultural income treatment, and more. Building Engine 4 to reason about these correctly required careful prompt design.

4. Free-tier API regional restrictions Our first choice (Google Gemini) had zero free-tier quota available in India. We pivoted to Groq, which turned out to be faster, genuinely free, and had no regional restrictions — a better outcome than our original plan.

5. NixOS development environment Building on NixOS meant standard Python package management didn't work out of the box. We had to learn shell.nix to create a reproducible development environment — painful at first, but it meant our setup worked identically on every machine with zero configuration.


What We Learned

  • Prompt engineering is software engineering. Small changes in how you frame a question to an LLM produce dramatically different quality outputs.
  • Structured JSON outputs from AI are far more useful than freeform text when building systems — they compose cleanly with the rest of your stack.
  • Separation of concerns matters even more in AI systems than in traditional software, because debugging AI behavior is hard enough without tangled architecture making it worse.
  • The best hackathon pivot is the one that makes your project stronger — our forced switch from Gemini to Groq was exactly that.

Impact for India

India's Digital India initiative aims to bring government services online — but digitizing a broken experience does not fix it. A confused citizen filling a form on a screen faces the same barriers as one filling it on paper.

CiviGuide AI addresses the last mile problem of digital governance — the gap between a form existing online and a citizen being able to correctly complete it.

With multilingual support on the roadmap, CiviGuide AI can eventually serve citizens in Hindi, Tamil, Bengali, Telugu, Marathi and more — reaching the 900 million Indians who are not comfortable in English, making government services truly accessible for the first time.


What's Next

  • Multi-language support — fill forms and ask questions in Hindi, Tamil, Bengali, Telugu, Marathi and other regional languages
  • State-specific form support — each Indian state has its own versions of ration cards, caste certificates, income certificates
  • Direct form overlay — fill answers directly onto the original form PDF layout instead of generating a new document
  • WhatsApp integration — so citizens can fill forms over a chat interface they already use every day, with zero app download
  • Offline mode — a compressed local model for areas with poor internet connectivity in rural India
  • DigiLocker integration — auto-fetch verified documents directly from the citizen's DigiLocker account to attach with submissions

Built With

Share this project:

Updates