What is it

The app shows a Legal RAG (Retrieval-Augmented Generation) system that works by breaking down legal documents into paragraphs and storing them in a vector database. When a user provides a feature description, the system embeds it and compares it against the stored legal paragraphs using similarity search (MaxSim). The top-matching results are passed to a reranker before being used to construct prompts for an LLM. The LLM then generates an output explaining whether the feature is geo-compliant or not, along with reasoning and references. The results can be validated against ground truth data and metrics.

Challenges and Future Ideas

  • parsing and chunking PDf was very difficult
  • explore use of active learning + HITL
  • automatically update knowledge base
  • integrate with github discussion

Built With

  • chromadb
  • fastapi
  • langchain
  • nextjs
  • python
Share this project:

Updates