🏛️ SF Legal Assistant

A multi-step agentic AI system for San Francisco legal research, built for the TiDB AgentX Hackathon. This project demonstrates a real-world legal assistant that can answer questions about San Francisco municipal laws and codes.

TiDB Cloud account Email: clai74@mail.ccsf.edu

🎯 Project Overview

Multi-Step Agentic Workflow:

  1. Data Ingest & Index → Download SF laws → Process into chunks → Generate embeddings → Store in TiDB
  2. Vector Search → Query user questions against law database using TiDB vector search
  3. LLM Analysis → Use Google Gemini to synthesize responses with specific legal citations
  4. Complete Automation → End-to-end pipeline from user question to legal analysis

🏗️ Architecture

  • Frontend: Next.js 15 with React 19, professional legal assistant UI
  • Backend: Next.js API routes for search and chat functionality
  • Database: TiDB Serverless with vector search capabilities
  • AI Models: Google Gemini for embeddings and text generation
  • Data Pipeline: Python scripts for downloading and processing SF laws

🚀 Quick Start

Prerequisites

  • Node.js 18+ and pnpm
  • Python 3.8+
  • TiDB Serverless account
  • Google AI API key (Gemini)

1. Environment Setup

# Clone and install dependencies
git clone <your-repo>
cd tidb
pnpm install

Create .env.local:

# TiDB Serverless connection
TIDB_HOST=your-host.tidbcloud.com
TIDB_PORT=4000
TIDB_USERNAME=your-username  
TIDB_PASSWORD=your-password
TIDB_DATABASE=your-database

# Google Gemini API
GEMINI_API_KEY=your-api-key-here

2. Data Pipeline Setup

# Complete data setup (all steps)
pnpm run setup-data

# Or run individual steps:
pnpm run download-laws    # Download SF law files
pnpm run process-laws     # Process into chunks
pnpm run load-data        # Upload to TiDB with embeddings

3. Run the Application

pnpm run dev

Visit http://localhost:3000 to use the Legal Assistant!

📊 Project Statistics

  • 18 SF Law Files downloaded and processed
  • 54,000+ Law Chunks with vector embeddings
  • Vector Search with similarity matching
  • Professional Legal UI with source citations
  • Real-time Status monitoring and error handling

🔧 Available Scripts

pnpm run dev              # Start development server
pnpm run build            # Build for production
pnpm run setup-data       # Complete data pipeline setup
pnpm run download-laws    # Download SF laws
pnpm run process-laws     # Process laws into chunks  
pnpm run load-data        # Load data to TiDB
pnpm run load-data:clear  # Clear and reload data

🎪 Demo Features

Multi-Step AI Agent Demo

  1. Ask a Legal Question:

    • "What are the parking regulations in San Francisco?"
    • "What permits do I need to start a business?"
    • "What are the noise ordinance rules?"
  2. AI Pipeline Executes:

    • Generates query embedding using Gemini
    • Searches 54K+ law chunks in TiDB vector database
    • Finds most relevant legal sections
    • Generates comprehensive response with citations
  3. Professional Output:

    • Detailed legal analysis
    • Specific SF municipal code references
    • Source citations with relevance scores
    • Professional legal disclaimer

Example Query Flow

User: "What are the parking regulations in downtown SF?"

AI Agent:
1. 🧠 Generate embedding for "parking regulations downtown"
2. 🔍 Search TiDB vector database → Find 8 relevant chunks
3. 📝 Gemini analyzes: Transportation Code, Police Code sections
4. ✅ Return: "Based on SF Transportation Code Section 7.2.5..."
   With sources: [Transportation.txt, Police.txt] + relevance scores

🏆 Hackathon Requirements Met

TiDB Serverless - Vector search database
Multi-step Workflow - Data ingest → Vector search → LLM response
Real-world Application - Functional legal assistant
Innovative Solution - Professional legal research tool
Quality Implementation - Production-ready code

🛠️ Technical Implementation

Database Schema

CREATE TABLE law_chunks (
  id BIGINT PRIMARY KEY AUTO_INCREMENT,
  content TEXT NOT NULL,
  embedding VECTOR(768) NOT NULL,
  metadata JSON NOT NULL,
  file_name VARCHAR(255) NOT NULL,
  -- Vector index for similarity search
  VECTOR INDEX idx_embedding (embedding)
);

API Endpoints

  • POST /api/chat - Main legal assistant endpoint
  • POST /api/search - Direct vector search
  • GET /api/status - System status and data metrics

Data Processing Pipeline

  1. Download: 18 SF law TXT files from GitHub
  2. Process: Clean text, extract metadata, chunk documents
  3. Embed: Generate 768-dim vectors using Gemini
  4. Load: Store in TiDB with vector indexes

🎬 Demo Script

  1. Show System Status: Data loaded indicator (54K+ chunks)
  2. Ask Legal Question: Use example questions or custom queries
  3. Highlight Multi-Step Process:
    • Vector search in TiDB
    • AI analysis with Gemini
    • Professional response with citations
  4. Show Source Citations: Specific SF municipal codes referenced
  5. Test Different Areas: Parking, business permits, noise ordinances

⚠️ Legal Disclaimer

This application is for informational purposes only and does not constitute legal advice. Always consult with a qualified attorney for specific legal matters.

🏅 Hackathon Submission Details

  • Team: Cline AI Assistant
  • Category: Agentic AI with Real-World Impact
  • Tech Stack: Next.js, TiDB Serverless, Google Gemini
  • Demo: Functional SF Legal Assistant
  • Data: 54,000+ SF law chunks with vector search
  • Multi-Step Agent: Query → Search → Analyze → Respond

Built for the TiDB AgentX Hackathon 2025 🚀

Built With

  • tidb
Share this project:

Updates