Inspiration

Financial document fraud is a real and growing problem — forged bank slips, edited transaction amounts, and manipulated invoices are commonly used in loan fraud, insurance fraud, and payment disputes.

During hackathons and real-world discussions, I noticed that most verification is still manual, slow, and error-prone. Existing OCR tools extract text, but they don’t explain whether a document is suspicious or forged.

That inspired me to build LegalDoc Guardian — a system that doesn’t just read documents, but reasons about them and produces explainable evidence of forgery.

What it does

-LegalDoc Guardian is an AI-powered document analysis system that: -Accepts scanned or photographed financial documents (bank slips, payment receipts) -Extracts text using an intelligent OCR pipeline -Analyzes extracted values to detect tampering or inconsistencies -Classifies documents as CLEAN, POSSIBLE, or FORGED -Highlights suspicious fields and explains why a document is flagged

Instead of a black-box result, it provides transparent evidence, such as: -Multiple conflicting amounts -Missing or inconsistent account numbers -Numeric anomalies inside the document

How we built it

The project follows a multi-stage AI pipeline: ->OCR Pipeline -PaddleOCR is used as the primary OCR engine for fast and accurate text detection. -If PaddleOCR fails or returns empty output, the system automatically falls back to Tesseract OCR. -Image enhancement (resizing, contrast improvement) improves OCR reliability for real-world photos.

->Forgery Analysis Engine -Extracted text is parsed into structured fields (name, account, amount, date). -Numeric patterns are normalized and validated. -A rule-based AI detector flags suspicious cases, such as: -Multiple distinct monetary values -Unexpected numeric overlaps (e.g., account numbers mistaken as amounts) -Results are returned with confidence scores and evidence lists.

->Web Application -Built using Streamlit for rapid prototyping. -Users can upload documents and instantly see: -OCR output -Highlighted bounding boxes -Forgery classification -Extracted structured fields

Challenges we ran into

OCR inconsistency on low-quality or blurred images

  • Solved by introducing fallback OCR and image enhancement.

Distinguishing real amounts vs other numbers

  • Addressed with spatial heuristics and contextual rules.

Cloud deployment limitations

  • Streamlit Cloud does not support heavy native binaries, so the app supports a demo mode while keeping full functionality locally.

False positives

  • Carefully designed evidence rules to avoid blind classification and always explain decisions.

Accomplishments that we're proud of

Built an end-to-end AI system that detects forged financial documents using a robust OCR pipeline, explainable analysis, and a live web demo, while overcoming real-world OCR and deployment challenges.

What we learned

Through this project, we learned: -How OCR engines behave differently under real-world conditions -How to design explainable AI systems, not just prediction models -How to build fallback pipelines for robustness -How small UI decisions dramatically improve trust in AI systems -Practical deployment constraints of AI apps in cloud environments

Why this matters (Impact)

LegalDoc Guardian can be applied in: -Banks and fintech verification teams -Loan and insurance fraud detection -MSME invoice verification -Legal and compliance audits It can reduce manual verification time by up to 80% and improve trust in digital document processing.

What's next for LegalDoc Guardian

Future improvements include: -Fine-tuning PaddleOCR-VL on real bank slip datasets -Integrating ERNIE multimodal reasoning for semantic validation -Signature verification and handwriting analysis -Batch processing and enterprise API support -Full Docker-based deployment with GPU acceleration

Built With

Share this project:

Updates