DocuMind

Inspiration

I wanted to build something that could help people understand documents faster. Reading through long PDFs is tedious, so I thought - why not let AI do the heavy lifting?

What it does

DocuMind is a multi-agent document analysis system. You upload a PDF, and it:

Extracts text using PaddleOCR
Analyzes the content with ERNIE
Generates summaries
Lets you ask questions about the document

How we built it

Started with the warm-up task (PDF to web converter) to get familiar with the APIs. Then built out the multi-agent system using CAMEL-AI patterns. Each agent has a specific job - OCR, analysis, summary, QA - and the coordinator manages them all.

The backend is FastAPI, frontend is plain HTML/CSS/JS. Nothing too fancy.

Challenges we ran into

ERNIE API auth format changed, took a while to figure out the new Bearer token setup
PaddleOCR needs Poppler for PDF conversion, ended up using PyPDF2 as fallback
Getting the agents to work together smoothly required some trial and error