Annual reports hold tons of useful information but are often hard to explore. We wanted a lightweight way to surface key insights without wading through pages of text and tables.
What it does Ingests PDF reports and breaks them into searchable chunks Lets users ask simple questions and get back short, context‑aware answers Offers on‑demand summaries of sections like financials or risk disclosures
How we built it Basic PDF extraction for text and tables Vector embeddings for semantic search A minimal Retrieval‑Augmented Generation flow to pull in relevant passages A simple front‑end to handle queries and show answers
Challenges we ran into Reports vary a lot in formatting Balancing completeness with context‑window limits Tuning search relevance for different types of content
Accomplishments that we’re proud of Fast prototype that returns relevant snippets Simple UI that lets testers ask questions right away Initial feedback suggests it’s easier than manual report review
What we learned Cleaning and organizing the raw text is more important than model tweaks Prompt design greatly affects answer clarity Chunking strategy makes a big difference for longer documents
What’s next for RAG your Reports Streamline PDF parsing for more report styles Improve answer formatting and citation references Expand UI features (e.g., saved queries, basic chart previews) Gradually add live data ingestion and more comprehensive summaries
Built With
- jupyterlab
- python
Log in or sign up for Devpost to join the conversation.