🏛️ SF Legal Assistant
A multi-step agentic AI system for San Francisco legal research, built for the TiDB AgentX Hackathon. This project demonstrates a real-world legal assistant that can answer questions about San Francisco municipal laws and codes.
TiDB Cloud account Email: clai74@mail.ccsf.edu
🎯 Project Overview
Multi-Step Agentic Workflow:
- Data Ingest & Index → Download SF laws → Process into chunks → Generate embeddings → Store in TiDB
- Vector Search → Query user questions against law database using TiDB vector search
- LLM Analysis → Use Google Gemini to synthesize responses with specific legal citations
- Complete Automation → End-to-end pipeline from user question to legal analysis
🏗️ Architecture
- Frontend: Next.js 15 with React 19, professional legal assistant UI
- Backend: Next.js API routes for search and chat functionality
- Database: TiDB Serverless with vector search capabilities
- AI Models: Google Gemini for embeddings and text generation
- Data Pipeline: Python scripts for downloading and processing SF laws
🚀 Quick Start
Prerequisites
- Node.js 18+ and pnpm
- Python 3.8+
- TiDB Serverless account
- Google AI API key (Gemini)
1. Environment Setup
# Clone and install dependencies
git clone <your-repo>
cd tidb
pnpm install
Create .env.local:
# TiDB Serverless connection
TIDB_HOST=your-host.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME=your-username
TIDB_PASSWORD=your-password
TIDB_DATABASE=your-database
# Google Gemini API
GEMINI_API_KEY=your-api-key-here
2. Data Pipeline Setup
# Complete data setup (all steps)
pnpm run setup-data
# Or run individual steps:
pnpm run download-laws # Download SF law files
pnpm run process-laws # Process into chunks
pnpm run load-data # Upload to TiDB with embeddings
3. Run the Application
pnpm run dev
Visit http://localhost:3000 to use the Legal Assistant!
📊 Project Statistics
- 18 SF Law Files downloaded and processed
- 54,000+ Law Chunks with vector embeddings
- Vector Search with similarity matching
- Professional Legal UI with source citations
- Real-time Status monitoring and error handling
🔧 Available Scripts
pnpm run dev # Start development server
pnpm run build # Build for production
pnpm run setup-data # Complete data pipeline setup
pnpm run download-laws # Download SF laws
pnpm run process-laws # Process laws into chunks
pnpm run load-data # Load data to TiDB
pnpm run load-data:clear # Clear and reload data
🎪 Demo Features
Multi-Step AI Agent Demo
Ask a Legal Question:
- "What are the parking regulations in San Francisco?"
- "What permits do I need to start a business?"
- "What are the noise ordinance rules?"
AI Pipeline Executes:
- Generates query embedding using Gemini
- Searches 54K+ law chunks in TiDB vector database
- Finds most relevant legal sections
- Generates comprehensive response with citations
Professional Output:
- Detailed legal analysis
- Specific SF municipal code references
- Source citations with relevance scores
- Professional legal disclaimer
Example Query Flow
User: "What are the parking regulations in downtown SF?"
AI Agent:
1. 🧠 Generate embedding for "parking regulations downtown"
2. 🔍 Search TiDB vector database → Find 8 relevant chunks
3. 📝 Gemini analyzes: Transportation Code, Police Code sections
4. ✅ Return: "Based on SF Transportation Code Section 7.2.5..."
With sources: [Transportation.txt, Police.txt] + relevance scores
🏆 Hackathon Requirements Met
✅ TiDB Serverless - Vector search database
✅ Multi-step Workflow - Data ingest → Vector search → LLM response
✅ Real-world Application - Functional legal assistant
✅ Innovative Solution - Professional legal research tool
✅ Quality Implementation - Production-ready code
🛠️ Technical Implementation
Database Schema
CREATE TABLE law_chunks (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
content TEXT NOT NULL,
embedding VECTOR(768) NOT NULL,
metadata JSON NOT NULL,
file_name VARCHAR(255) NOT NULL,
-- Vector index for similarity search
VECTOR INDEX idx_embedding (embedding)
);
API Endpoints
POST /api/chat- Main legal assistant endpointPOST /api/search- Direct vector searchGET /api/status- System status and data metrics
Data Processing Pipeline
- Download: 18 SF law TXT files from GitHub
- Process: Clean text, extract metadata, chunk documents
- Embed: Generate 768-dim vectors using Gemini
- Load: Store in TiDB with vector indexes
🎬 Demo Script
- Show System Status: Data loaded indicator (54K+ chunks)
- Ask Legal Question: Use example questions or custom queries
- Highlight Multi-Step Process:
- Vector search in TiDB
- AI analysis with Gemini
- Professional response with citations
- Show Source Citations: Specific SF municipal codes referenced
- Test Different Areas: Parking, business permits, noise ordinances
⚠️ Legal Disclaimer
This application is for informational purposes only and does not constitute legal advice. Always consult with a qualified attorney for specific legal matters.
🏅 Hackathon Submission Details
- Team: Cline AI Assistant
- Category: Agentic AI with Real-World Impact
- Tech Stack: Next.js, TiDB Serverless, Google Gemini
- Demo: Functional SF Legal Assistant
- Data: 54,000+ SF law chunks with vector search
- Multi-Step Agent: Query → Search → Analyze → Respond
Built for the TiDB AgentX Hackathon 2025 🚀
Built With
- tidb
Log in or sign up for Devpost to join the conversation.