Inspiration

Navigating Portuguese legal documents can be overwhelming for citizens and small businesses. We were inspired to democratize access to legal information after seeing how difficult it is for non-lawyers to understand their rights and obligations. The complexity of legal language creates a barrier that AI can help break down.

What it does

Our Portuguese Legal Assistant transforms how people interact with legal information. Users can ask questions in plain Portuguese and receive accurate, cited answers from official sources. The system:

  • Scrapes and indexes legal documents from Diário da República
  • Provides natural language Q&A with source citations
  • Enables document upload for analysis (Stage 2)
  • Identifies relevant laws for contracts and documents
  • Detects potential legal issues in uploaded documents

How we built it

We architected a scalable RAG system using Google Cloud and MongoDB Atlas:

  • Data Pipeline: Cloud Run services scrape diariodarepublica.pt, process documents into semantic chunks
  • Vector Storage: MongoDB Atlas stores embeddings with vector search capabilities
  • AI Integration: Vertex AI generates embeddings (gemini-embedding-001) and responses (Gemini Pro)
  • Frontend: Streamlit interface for intuitive user interaction
  • Backend: FastAPI services handle retrieval and processing
  • Infrastructure: Fully containerized with Docker, deployed on Cloud Run for auto-scaling

Challenges we ran into

  • Portuguese Language Processing: Limited multilingual AI tools required careful prompt engineering
  • Legal Document Structure: Complex hierarchical laws needed smart chunking strategies
  • Vector Search Optimization: Balancing semantic search accuracy with response time
  • Scraping Dynamics: Handling website changes and rate limiting
  • Context Window Management: Ensuring relevant legal context fits within LLM limits

Accomplishments that we're proud of

  • Successfully indexed thousands of Portuguese legal documents with 95%+ accuracy
  • Achieved sub-2 second response times for complex legal queries
  • Built a hybrid search system combining keyword and semantic search
  • Created an intuitive interface that non-technical users can navigate
  • Implemented proper citation tracking for legal compliance

What we learned

  • MongoDB's vector search capabilities integrate seamlessly with Google Cloud
  • Chunking strategies significantly impact retrieval quality
  • Portuguese legal terminology requires domain-specific embeddings fine-tuning
  • Users prefer cited, verifiable answers over generic legal advice
  • Incremental indexing is crucial for maintaining up-to-date legal databases

What's next for Portuguese Legal Assistant

  • Fine-tuned Models: Train Portuguese legal-specific embeddings
  • Multi-format Support: Process PDFs, Word documents, and scanned documents
  • Legal Timeline: Track law changes and amendments over time
  • Collaboration Features: Allow lawyers to annotate and verify responses
  • API Access: Enable integration with legal practice management systems
  • European Expansion: Extend to other countries' legal systems

Built With

  • beautifulsoup4
  • cloud-build
  • cloud-run
  • cloud-storage
  • docker
  • fastapi
  • gemini-pro
  • google-cloud
  • google-cloud-aiplatform
  • iam
  • mongodb-atlas
  • portuguese-nlp
  • pymongo
  • python
  • rag
  • secret-manager
  • streamlit
  • vector-search
  • vertex-ai
  • vpc
Share this project:

Updates