Inspiration

AuditArk was inspired by the everyday struggle of manually processing receipt data. Teams often spend hours typing totals, vendor names, and dates into spreadsheets, and small mistakes can cause major reporting issues later. We wanted to build a practical desktop tool that automates receipt extraction while still giving users full control to review and correct data before final reports.

What it does

AuditArk helps users upload receipt batches, extract structured fields using OCR, review and edit records, and generate clean exports for reporting. It is designed for offline-first use so data remains available and secure on local systems.

How we built it

We built AuditArk as a desktop app with a frontend for batch management and record review, and a backend service for OCR, parsing, and reporting logic. The system processes receipt images, converts text to structured entries, and stores results in a local database. We also added export workflows so users can move validated data directly into reporting pipelines.

Challenges we ran into

  • OCR inconsistency across different receipt layouts, print quality, and image angles
  • Packaging backend dependencies into a stable desktop build
  • Balancing extraction speed with accuracy and editability
  • Designing a workflow that is fast for bulk processing but still transparent and trustworthy

Accomplishments that we're proud of

  • Built an end-to-end receipt workflow from upload to export
  • Created a usable correction flow so users can confidently validate extracted data
  • Delivered an offline-capable desktop setup with integrated backend processing
  • Improved reliability of structured extraction for real, messy receipts

What we learned

We learned that production OCR is not just about model quality; it is also about strong fallback logic, validation workflows, and user experience. We also learned that packaging and deployment are core engineering challenges in desktop products, not just final steps. Measuring extraction quality with a consistent metric helped us prioritize practical improvements:

$$ F_1 = \frac{2PR}{P + R} $$

where P is precision and R is recall.

What's next for AuditArk

  • Improve extraction for edge-case and low-quality receipts
  • Expand vendor normalization and category intelligence
  • Add richer analytics and reporting templates
  • Introduce regression benchmarks to continuously track OCR and parsing quality

Built With

Share this project:

Updates