🏛️ SF Legal Assistant

A multi-step agentic AI system for San Francisco legal research, built for the TiDB AgentX Hackathon. This project demonstrates a real-world legal assistant that can answer questions about San Francisco municipal laws and codes.

TiDB Cloud account Email: clai74@mail.ccsf.edu

🎯 Project Overview

Multi-Step Agentic Workflow:

Data Ingest & Index → Download SF laws → Process into chunks → Generate embeddings → Store in TiDB
Vector Search → Query user questions against law database using TiDB vector search
LLM Analysis → Use Google Gemini to synthesize responses with specific legal citations
Complete Automation → End-to-end pipeline from user question to legal analysis

🏗️ Architecture

Frontend: Next.js 15 with React 19, professional legal assistant UI
Backend: Next.js API routes for search and chat functionality
Database: TiDB Serverless with vector search capabilities
AI Models: Google Gemini for embeddings and text generation
Data Pipeline: Python scripts for downloading and processing SF laws

🚀 Quick Start

Prerequisites

Node.js 18+ and pnpm
Python 3.8+
TiDB Serverless account
Google AI API key (Gemini)

1. Environment Setup

# Clone and install dependencies
git clone <your-repo>
cd tidb
pnpm install

Create .env.local:

# TiDB Serverless connection
TIDB_HOST=your-host.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME=your-username  
TIDB_PASSWORD=your-password
TIDB_DATABASE=your-database

# Google Gemini API
GEMINI_API_KEY=your-api-key-here

2. Data Pipeline Setup

# Complete data setup (all steps)
pnpm run setup-data

# Or run individual steps:
pnpm run download-laws    # Download SF law files
pnpm run process-laws     # Process into chunks
pnpm run load-data        # Upload to TiDB with embeddings

3. Run the Application

pnpm run dev

Visit http://localhost:3000 to use the Legal Assistant!

📊 Project Statistics

18 SF Law Files downloaded and processed
54,000+ Law Chunks with vector embeddings
Vector Search with similarity matching
Professional Legal UI with source citations
Real-time Status monitoring and error handling

🔧 Available Scripts

pnpm run dev              # Start development server
pnpm run build            # Build for production
pnpm run setup-data       # Complete data pipeline setup
pnpm run download-laws    # Download SF laws
pnpm run process-laws     # Process laws into chunks  
pnpm run load-data        # Load data to TiDB
pnpm run load-data:clear  # Clear and reload data

🎪 Demo Features

Multi-Step AI Agent Demo

Ask a Legal Question:
- "What are the parking regulations in San Francisco?"
- "What permits do I need to start a business?"
- "What are the noise ordinance rules?"
AI Pipeline Executes:
- Generates query embedding using Gemini
- Searches 54K+ law chunks in TiDB vector database
- Finds most relevant legal sections
- Generates comprehensive response with citations
Professional Output:
- Detailed legal analysis
- Specific SF municipal code references
- Source citations with relevance scores
- Professional legal disclaimer

Example Query Flow

User: "What are the parking regulations in downtown SF?"

AI Agent:
1. 🧠 Generate embedding for "parking regulations downtown"
2. 🔍 Search TiDB vector database → Find 8 relevant chunks
3. 📝 Gemini analyzes: Transportation Code, Police Code sections
4. ✅ Return: "Based on SF Transportation Code Section 7.2.5..."
   With sources: [Transportation.txt, Police.txt] + relevance scores

🏆 Hackathon Requirements Met

✅ TiDB Serverless - Vector search database
✅ Multi-step Workflow - Data ingest → Vector search → LLM response
✅ Real-world Application - Functional legal assistant
✅ Innovative Solution - Professional legal research tool
✅ Quality Implementation - Production-ready code

🛠️ Technical Implementation

Database Schema

CREATE TABLE law_chunks (
  id BIGINT PRIMARY KEY AUTO_INCREMENT,
  content TEXT NOT NULL,
  embedding VECTOR(768) NOT NULL,
  metadata JSON NOT NULL,
  file_name VARCHAR(255) NOT NULL,
  -- Vector index for similarity search
  VECTOR INDEX idx_embedding (embedding)
);

API Endpoints

POST /api/chat - Main legal assistant endpoint
POST /api/search - Direct vector search
GET /api/status - System status and data metrics

Data Processing Pipeline

Download: 18 SF law TXT files from GitHub
Process: Clean text, extract metadata, chunk documents
Embed: Generate 768-dim vectors using Gemini
Load: Store in TiDB with vector indexes

🎬 Demo Script

Show System Status: Data loaded indicator (54K+ chunks)
Ask Legal Question: Use example questions or custom queries
Highlight Multi-Step Process:
- Vector search in TiDB
- AI analysis with Gemini
- Professional response with citations
Show Source Citations: Specific SF municipal codes referenced
Test Different Areas: Parking, business permits, noise ordinances

⚠️ Legal Disclaimer

This application is for informational purposes only and does not constitute legal advice. Always consult with a qualified attorney for specific legal matters.

🏅 Hackathon Submission Details

Team: Cline AI Assistant
Category: Agentic AI with Real-World Impact
Tech Stack: Next.js, TiDB Serverless, Google Gemini
Demo: Functional SF Legal Assistant
Data: 54,000+ SF law chunks with vector search
Multi-Step Agent: Query → Search → Analyze → Respond

Built for the TiDB AgentX Hackathon 2025 🚀

Built With

tidb

Updates

Nelson Lai started this project — Sep 15, 2025 08:35 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.