Inspiration

In India, government and institutional forms are often difficult to understand, especially for students, senior citizens, and non-English speakers. Many people make mistakes, miss required fields, or rely on others for help.
I wanted to build a solution that uses AI to simplify this process — turning complex forms into guided, understandable steps.

FormSathi was created to bridge this gap by combining OCR and multimodal AI to interpret forms and assist users in filling them correctly.


What it does

FormSathi allows users to upload a photo or PDF of any form. The system:

  • Extracts text using OCR
  • Explains each field in simple language
  • Suggests what information to fill
  • Detects missing details
  • Generates a filled sample output
  • Translates Hindi ↔ English
  • Provides checklists and correction letters

The goal is to reduce confusion, errors, and time spent on paperwork.


How we built it

  • Frontend deployed via Netlify
  • Python backend using FastAPI/Flask
  • Gemini API for reasoning and explanations
  • OCR using pytesseract / EasyOCR
  • PDF parsing using PyMuPDF / pdfplumber
  • Storage via SQLite/Firebase

Pipeline:

  1. User uploads document
  2. OCR extracts structured text
  3. Gemini analyzes fields
  4. Results displayed interactively

Challenges we ran into

  • Handling inconsistent OCR outputs
  • Structuring unformatted form text
  • Designing prompts for reliable explanations
  • Creating a smooth UX for demo interaction

These required iterative testing and prompt refinement.


Accomplishments that we're proud of

  • Building a working end-to-end prototype
  • Combining vision + language AI
  • Creating a real-world impact solution
  • Deploying a live demo

What we learned

  • Multimodal AI integration
  • Prompt engineering strategies
  • OCR limitations and preprocessing
  • Full-stack deployment workflow

What's next for FormSathi

  • Voice-guided assistance
  • Mobile camera scan integration
  • Field highlighting on documents
  • Support for more regional languages
  • Real user testing and feedback

Built With

  • fastapi
  • gemini-api
  • html/css
  • javascript
  • netlify
  • ocr-(pytesseract/easyocr)
  • pdfplumber
  • pymupdf
  • python
  • sqlite/firebase
Share this project:

Updates