Inspiration
The idea for DocuLens AI came from a real problem. A family member signed a 28-page apartment lease without reading it. Hidden on page 19 was a clause requiring 60-day move-out notice — they missed it and lost their security deposit.
Legal documents are everywhere: employment contracts, rental agreements, HR policies, eviction notices. But lawyers cost $300+/hour, and legal aid is overwhelmed. Most people sign without understanding what they're agreeing to.
We wanted to change that. Make legal understanding accessible to everyone — not just those who can afford a lawyer.
What it does
DocuLens AI is a legal document assistant that helps people understand contracts, policies, and notices without needing a legal background.
Upload any PDF — lease, employment contract, HR policy, eviction notice.
Get instant insights:
- 📋 Key clauses extracted and explained in plain English
- ⚠️ Risk alerts highlighting hidden penalties and unfair terms
- 🛡️ Your rights — what you should know, check, or ask about
- 💡 Actionable next steps
Smart features:
- "Explain in simpler words" button for any clause
- Ask questions about the document ("What's the late fee policy?")
- Works 100% locally with no data leaving your computer
How we built it
Tech Stack:
- Frontend: React with Tailwind CSS
- Backend: FastAPI (Python)
- RAG Pipeline: LangChain for document chunking + retrieval
- Vector Database: ChromaDB for semantic search
- LLM: Ollama with Mistral 7B (runs locally, zero API costs)
- PDF Processing: PyPDFLoader
Architecture:
- User uploads PDF → Backend saves and parses document
- LangChain splits text into 1500-token chunks with 200-token overlap
- Each chunk is embedded and stored in ChromaDB vector database
- Mistral LLM extracts key clauses, risks, and generates analysis in JSON
- Results displayed in React frontend with risk badges (🟢🟡🔴)
- Users can ask follow-up questions via RAG retrieval
Why this stack:
- FastAPI: Async support, automatic API docs, Python-native
- ChromaDB: Local-first, no cloud dependencies
- Ollama + Mistral: Completely free, private, runs on any laptop
- LangChain: Simplifies complex RAG workflows
Challenges we ran into
1. Switching from OpenAI to local LLM
Initially we used GPT-4o-mini, but API costs added up during testing. We migrated to Ollama + Mistral — completely free, but required optimizing prompts for structured JSON output and handling slower inference (3-5 seconds per analysis).
2. JSON parsing failures
Mistral sometimes added markdown formatting or extra text before JSON. We built a clean_json_response() function that strips markdown and extracts only the JSON block.
3. PDF text extraction
Some PDFs were scanned images, not searchable text. We added error handling and clear messaging when a document can't be parsed.
4. Long document context limits
Mistral has a 8K token context. We truncate documents to 10,000 characters and use RAG to retrieve only relevant sections for Q&A, not the entire document.
5. Frontend-backend integration
Our teammate built the React frontend, and connecting it to FastAPI required careful CORS configuration and consistent JSON schemas between teams.
Accomplishments that we're proud of
✅ Working RAG system with local LLM
No cloud costs, no API keys, complete privacy. Anyone can run DocuLens AI on their laptop.
✅ Caught hidden risks in real documents
We tested on an actual employment contract and flagged a $25,000 penalty clause buried on page 4, plus a 3-year non-compete that banned any job within 100 miles.
✅ Plain English that actually works
Mistral generates explanations at 5th-grade reading level. Users don't need legal degrees to understand their rights.
✅ Clean, professional UI
Risk badges (🟢🟡🔴), collapsible clauses, one-click explanations, mobile-responsive design.
✅ Complete pipeline in 3 weeks
From zero to fully functional RAG application — PDF upload, vector search, LLM extraction, interactive Q&A.
What we learned
Technical lessons:
- LLM prompting is an art. Small changes in prompt structure drastically impact JSON output quality.
- ChromaDB is surprisingly easy to set up; local vector search is viable for documents under 100 pages.
- Ollama makes local LLMs accessible — no GPU required for 7B models on a modern laptop.
- RAG isn't just for chatbots; it's perfect for document analysis where you need to cite specific sections.
Teamwork lessons:
- Clear API contracts (endpoints + JSON shapes) prevent frontend-backend conflicts.
- Version control everything, including prompts — small prompt changes can break outputs silently.
Product lessons:
- Users don't want "AI magic" — they want citations, quotes, and clear reasoning they can verify.
- Legal tech has huge responsibility. We added a disclaimer: "Not legal advice. Consult an attorney for binding decisions."
What's next for DocuLens AI
Short-term (next month):
- Deploy to free tier (Render + Vercel) for public demo
- Add support for DOCX and TXT files
- Save analysis history with user accounts
- Dark mode 🌙
Medium-term (3-6 months):
- Browser extension for Google Docs, email attachments, and online contracts
- Batch upload for HR teams and small businesses
- Export analysis as PDF report with highlighted clauses
- Support for more languages (Spanish, Hindi, Mandarin)
Long-term vision:
- Open source the core engine for community contributions
- API for developers to integrate DocuLens into their apps
- Fine-tune Mistral on 10,000 legal documents for better extraction
- Partnership with legal aid organizations to help low-income individuals
We built DocuLens AI because legal help shouldn't be a luxury. It should be a right.


Log in or sign up for Devpost to join the conversation.