RAG your Reports

Annual reports hold tons of useful information but are often hard to explore. We wanted a lightweight way to surface key insights without wading through pages of text and tables.

What it does Ingests PDF reports and breaks them into searchable chunks Lets users ask simple questions and get back short, context‑aware answers Offers on‑demand summaries of sections like financials or risk disclosures

How we built it Basic PDF extraction for text and tables Vector embeddings for semantic search A minimal Retrieval‑Augmented Generation flow to pull in relevant passages A simple front‑end to handle queries and show answers

Challenges we ran into Reports vary a lot in formatting Balancing completeness with context‑window limits Tuning search relevance for different types of content

Accomplishments that we’re proud of Fast prototype that returns relevant snippets Simple UI that lets testers ask questions right away Initial feedback suggests it’s easier than manual report review

What we learned Cleaning and organizing the raw text is more important than model tweaks Prompt design greatly affects answer clarity Chunking strategy makes a big difference for longer documents

What’s next for RAG your Reports Streamline PDF parsing for more report styles Improve answer formatting and citation references Expand UI features (e.g., saved queries, basic chart previews) Gradually add live data ingestion and more comprehensive summaries

Built With

jupyterlab
python

Updates

Suhaibuddin Ahmed started this project — Apr 19, 2025 11:42 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.