Finance PDF RAG QA Evaluator
Inspiration
Finance teams already run RAG chatbots against 200-page reports, yet no one can quickly prove the answers are correct. One wrong figure can move markets, so we built a lightweight evaluator that tells you ,in minutes, how trustworthy your own model is.
What it does
- Point to a folder of PDFs and pick a question count.
- We chunk every report, then call Perplexity Sonar to write finance-aware questions that name the company and fiscal year.
- Your RAG model answers; a second Sonar call scores each answer for factual accuracy, completeness, and clarity.
- Results land in a CSV and an interactive plot, highlighting exactly where the model hallucinates and what to fix next.
How we built it
LangChain wires everything together. PyMuPDF extracts text; RecursiveCharacterTextSplitter makes 5 k-token chunks. Sonar serves as both question writer and judge, and the whole loop runs locally in a few minutes.
Challenges
- Context balance: giving Sonar enough text so questions make sense without leaking answers.
- Messy data: cleaning scanned PDFs and broken tables.
- Performance: keeping evaluation fast on a laptop.
- Sonar gaps: no embeddings endpoint and thin docs, so we spent extra hours researching work-arounds and prompt formats.
Accomplishments
Despite only one teammate knowing RAG, we ramped up in 10 evenings and delivered a one-command demo, from ingest to dashboard.
What we learned
Real documents are far messier than any tutorial; most engineering time goes to parsing and optimization, not to glamorous LLM calls.
What’s next
- Batch evaluation: send whole batches of question-answer pairs to the judge in one call, cutting evaluation time from minutes to seconds.
- PyPI package:
pip install rag-qa-evalfor instant drop-in use. - Domain plug-ins: legal, healthcare, retail modules with their own prompts and checks.
- Drag-and-drop web UI: non-devs can upload PDFs and get scores.
- CI badge: build fails automatically when trust scores fall below a threshold.
Log in or sign up for Devpost to join the conversation.