Inspiration
I wanted to build something that could help people understand documents faster. Reading through long PDFs is tedious, so I thought - why not let AI do the heavy lifting?
What it does
DocuMind is a multi-agent document analysis system. You upload a PDF, and it:
- Extracts text using PaddleOCR
- Analyzes the content with ERNIE
- Generates summaries
- Lets you ask questions about the document
How we built it
Started with the warm-up task (PDF to web converter) to get familiar with the APIs. Then built out the multi-agent system using CAMEL-AI patterns. Each agent has a specific job - OCR, analysis, summary, QA - and the coordinator manages them all.
The backend is FastAPI, frontend is plain HTML/CSS/JS. Nothing too fancy.
Challenges we ran into
- ERNIE API auth format changed, took a while to figure out the new Bearer token setup
- PaddleOCR needs Poppler for PDF conversion, ended up using PyPDF2 as fallback
- Getting the agents to work together smoothly required some trial and error
Accomplishments that we're proud of
- Got the full pipeline working end-to-end
- The QA feature actually gives useful answers
- Warm-up task works reliably
What we learned
- How to use ERNIE API properly
- Multi-agent system design patterns
- PaddleOCR is pretty powerful for document extraction
What's next for DocuMind
- Add support for more document types (Word, images)
- Improve the analysis accuracy
- Maybe add a Chrome extension
Log in or sign up for Devpost to join the conversation.