The Inspiration (The SEA Reality)
In Southeast Asia, financial due diligence faces a hard truth: financial statements aren't clean APIs or digital spreadsheets. They are messy, scanned PDFs with red stamps and misaligned columns.
Analysts at PE funds, banks, and Big4 audit firms spend 80% of their time on manual data entry and formatting, leaving only 20% for actual risk analysis. We built Clarif.ai to invert this ratio. We want to democratize McKinsey-level financial reasoning for every SME, Credit Officer, and M&A Analyst.
What it does
Clarif.ai is an Autonomous Financial Due Diligence Copilot. You upload a raw, scanned financial statement (PDF/Image). Within 30 seconds, the system:
Visually scans and extracts core financial metrics (without losing spatial alignment).
Structures the raw data into a strictly validated JSON schema.
Acts as an elite M&A Analyst to calculate risk ratios (Current Ratio, D/E).
Outputs a highly actionable Red Flag Report, highlighting liquidity warnings, cash flow anomalies, and credit risks.
How we built it (The Hybrid Architecture)
We intentionally avoided the "lazy" approach of just feeding a PDF into ChatGPT, which leads to severe data hallucinations. We built a defensible, deterministic pipeline using Dify:
The "Eye" (Gemini 1.5 Pro): We utilized Gemini's native multimodal spatial understanding to read scanned PDFs visually, preserving the row/column integrity of complex balance sheets.
The "Filter" (Interfaze API): We routed the raw text through Interfaze to strictly enforce a JSON schema. This guarantees Data Integrity—our system either outputs 100% correct JSON or fails gracefully. No fake numbers.
The "Brain" (GPT-4o): With clean, structured JSON, we deployed GPT-4o purely for logical reasoning, prompting it to act as an elite risk detector.
Challenges we ran into
The "Scanned PDF" Wall: Early on, standard Document Extractors completely broke down. They scraped text sequentially, merging "Last Year" and "This Year" columns into a chaotic mess. The Pivot: We made a critical architectural decision to drop traditional text-parsers and pivot to Vision AI (Gemini 1.5 Pro). By treating the PDF as an image, we preserved spatial relationships and achieved a 99% extraction accuracy on Vietnamese accounting standards.
Accomplishments that we're proud of
We successfully built a production-ready Agentic Workflow that guarantees Deterministic Output (JSON) from Non-Deterministic Inputs (Scanned PDFs). We solved the hallucination problem in AI financial analysis.
What we learned
Vision > Text: For financial documents in emerging markets, visual spatial understanding is vastly superior to traditional OCR text extraction.
Structure is Everything: AI's reasoning capacity increases exponentially when forced to consume and output strictly typed data (JSON).
What's next for Clarif.ai
We aim to scale this into a B2B SaaS for Credit Risk Assessment. The next technical milestone is integrating Real-time Web Search (Exa/Bright Data) to cross-reference the extracted balance sheet data with live legal scandals, tax debt news, and market sentiment.
Built With
- codex
- dify
- exa
- gemini-1.5-pro
- gpt
- gpt-4o
- interfaze
- json
- python
- python-package-index
- trae
Log in or sign up for Devpost to join the conversation.