Inspiration

Indian entrepreneurs often struggle to navigate the overwhelming and confusing legal landscape, especially around company registration, tax compliance, and government schemes. Legal information is either scattered across outdated PDFs or buried inside long government websites. We wanted to build a legal assistant that speaks clearly, simply, and is contextually aware of Indian laws — so founders can focus on building instead of bureaucracy.

What it does

LegalEase is an AI-powered legal assistant for Indian startups. You can ask it legal, tax, and registration-related questions — and it answers in simple language, supported by official sources like:

StartupIndia.gov.in IncomeTaxIndia.gov.in

Under the hood, LegalEase uses RAG (Retrieval Augmented Generation) with OpenAI’s GPT-4o model and a local ChromaDB vector store to deliver fact-based, up-to-date answers — instead of hallucinating like generic chatbots.

How we built it

Frontend: Built with Streamlit for a clean, no-frills chat UI.

Backend: Python-based RAG pipeline using Pydantic AI Framework. ChromaDB as our local vector database to store and search document chunks. OpenAI GPT-4o as the core LLM.

Data ingestion: Used Crawl4AI to scrape and chunk structured content from trusted government portals. Used OpenAI Embeddings for better retrieval precision compared to traditional embedding models.

Challenges we ran into

PDFs don’t work well: Legal PDFs like the Income Tax Act are extremely long and semantically unstructured, which made chunking and retrieval very hard. Poor government site structure: Some official websites use outdated or JS-heavy structures, making crawling or parsing difficult. Relevance of retrieval: We had to tune chunk sizes, overlaps, and even embeddings to improve RAG accuracy — especially on legal queries. Time constraints: Building a full-stack working legal agent with curated data and dynamic retrieval was intense to pull off during a weekend.

Accomplishments that we're proud of

Successfully built a working legal chatbot prototype tailored for Indian founders. Got high-precision results from the StartupIndia data source with link-backed answers. Built a modular backend that can easily scale to other Indian legal domains. Clean, minimal, and working front-end demo — deployable and testable!

What we learned

Generic LLMs are not enough — you must use RAG with curated data to get reliable answers for serious domains like law. PDFs are hard — HTML pages with clean structure perform way better in real-world retrieval tasks. Embeddings matter — OpenAI’s embeddings significantly improved retrieval quality compared to SentenceTransformer models. Simple UIs win — Streamlit let us move fast and demo quickly.

What's next for LegalEase

🔍 Add more sources: MCA, GST portal, RBI circulars, SEBI guidelines. 🗂 Categorized modes: Toggle between Tax, Startup, Funding, IP, etc. 🧾 Compliance reminders: Let users track filings like PAN, TDS, or 80-IAC applications. 📱 WhatsApp or API integration: For legal help on-the-go. 👨‍⚖️ AI + Expert loop: Let users escalate hard questions to verified CA/CS/lawyers.

Built With

  • chatgpt
  • chromadb
  • langchain
  • pydantic
  • python
  • streamlit
Share this project:

Updates