Inspiration
As a second-year IT student building projects entirely from my phone, I constantly struggled with long PDFs, research papers, and dense documents. There was no quick way to extract what actually mattered.
I built this for every student, researcher, and professional who doesn't have hours to read — but still needs to understand.
What It Does
An AI-powered document analysis API that supports:
- 📄 PDF & DOCX parsing with full text extraction
- 🖼️ OCR for scanned/image-based documents
- 📝 Summarization — key points in seconds
- 🏷️ Named Entity Recognition (NER) — people, places, orgs
- 💬 Sentiment Analysis — tone detection across content
- 🗂️ Topic Classification — auto-categorize documents
- 🔍 Document Comparison — diff two docs intelligently
- ❓ Q&A — ask any question, get answers from your document
How I Built It
Built entirely on Google Colab from my phone — no laptop, no desktop.
Tech Stack:
Python— core logicGroq API(llama-3.3-70b-versatile) — LLM backbonePyMuPDF / pdfplumber— PDF extractionpython-docx— DOCX parsingpytesseract— OCR for scanned filesFastAPI— REST API layerHuggingFace Transformers— initial NER pipeline
The Q&A feature works by chunking the document into segments, passing relevant chunks + the user's question to the LLM, and returning a grounded answer — no hallucination from outside context.
Mathematically, for a document $D$ split into chunks ${c_1, c_2, \ldots, c_n}$, the model retrieves the most relevant chunk:
$$c^* = \arg\max_{c_i} \text{sim}(q, c_i)$$
where $q$ is the user query and $\text{sim}$ is cosine similarity over embeddings. The answer is then generated conditioned on $c^*$:
$$A = \text{LLM}(q \mid c^*)$$
Challenges
🔴 Model Deprecations Started with local HuggingFace models — ran into memory crashes on Colab's free tier. Switched to Groq API for reliability and speed.
🔴 API Credit Exhaustion Hit rate limits mid-build during the hackathon. Had to restructure calls to batch efficiently and reduce redundant requests.
🔴 OCR Accuracy Scanned documents with low DPI returned garbled text. Added preprocessing (grayscale + threshold) before passing to tesseract.
🔴 Endpoint Format Mismatches FastAPI schema mismatches caused silent failures. Debugged purely from Colab output logs — no browser dev tools on mobile.
🔴 Dual Deadline Pressure This was submitted for both a GUVI hackathon and an HCL hackathon on the same day — built, tested, and submitted solo under pressure.
What I Learned
- Groq's free tier is a lifesaver for hackathon-scale LLM apps
- OCR pipelines need image preprocessing — raw scans don't work
- Chunking strategy directly impacts Q&A accuracy
- Building mobile-only teaches you to be ruthlessly efficient
- Dual deadlines are brutal — but doable with clean architecture
What's Next
- [ ] Vector DB integration (FAISS) for smarter chunk retrieval
- [ ] Multi-document Q&A across a folder
- [ ] Streamlit frontend for non-technical users
- [ ] Hindi + Tamil language document support
Log in or sign up for Devpost to join the conversation.