The Inspiration

Legal fees are a massive barrier to justice, with 71% of Canadians unable to afford a lawyer for every question they have. While LLMs offer a potential solution, they suffer from a fatal flaw in high-stakes environments: hallucinations. We saw general-purpose AIs citing non-existent legal codes and misleading users. We were inspired to build SpecterBot to democratize access to the law by creating an AI that doesn't just talk, it provides the "receipts".

What We Learned

We learned that building for the legal domain requires a "trust but verify" architecture. Moving beyond simple RAG, we discovered the importance of Hybrid Search, combining the semantic power of vector embeddings with the keyword precision of full-text search. We also learned that the official Justice Canada XML corpus is a goldmine of structured data, but parsing it into a clean, queryable format requires a robust ETL pipeline. How We Built It

Our architecture is designed for speed and verifiability:

ETL Pipeline: We ingested over 400+ federal statutes from Justice Canada’s XML repository into a PostgreSQL database.

Hybrid Retrieval: We use a weighted scoring system for search. If Sv​ is the vector similarity score and St​ is the full-text search score, our final rank R is calculated as:

R=0.7(Sv​)+0.3(St​)

The Hallucination Protection: Every response from our LLM (Gemini 2.5 Flash / Llama 3.3) is generated in a strict JSON format. A validator cross-checks every cited section ID against the retrieved context; if a citation wasn't in the source window, it is blocked before it reaches the UI.

The Experience: The frontend features a three-panel layout with SSE streaming, a Legal Graph to visualize statutory connections, and ElevenLabs voice integration for hands-free research.

Challenges We Faced

XML Ingestion: The Justice Canada XML files are highly complex. Building a parser that could accurately extract nested sections while maintaining legal hierarchy was our first major hurdle.

Multi-Turn Accuracy: Keeping vector search accurate during long conversations was difficult. We solved this by using the LLM to reformulate follow-up questions into standalone queries based on conversation memory.

Latency vs. Verification: Running a validation layer on top of a streaming LLM response added latency. We optimized our asyncpg connection pool and backend logic to ensure the "Firewall" didn't compromise the real-time feel of the chat.

What’s Next for SpecterBot

Provincial Law Expansion: While we currently focus on the 400+ federal statutes in the Justice Canada corpus, the next step is to ingest and index provincial statutes to provide a complete legal picture across all jurisdictions.

Case Law Integration: We plan to move beyond static statutes by integrating CanLII data, allowing the AI to cite specific court rulings that interpret the laws, providing deeper context for legal research.

Advanced Legal Graph: We intend to upgrade our ReactFlow visualization to map complex relationships between different statutes, showing how one federal act references or overrides another.

Professional Export Tools: To support law students and small firms, we are looking into a feature that allows users to export verified research sessions into formatted legal briefs or PDF summaries.

Multilingual Support: Given Canada’s bilingual nature, we want to enable seamless switching between English and French statutory texts, maintaining the same rigorous hallucination checks for both official languages.

Built With

Share this project:

Updates