Docvoice

Demo
add an url
indexed urls
create an agent
agents

💡 Inspiration

Our inspiration came from the growing need for intelligent, accessible knowledge management in today's digital world. We observed that:

Customer Support Teams spend hours searching through documentation to answer questions
Developers struggle to find relevant information in vast codebases and documentation
End Users prefer natural voice interactions over complex search interfaces
Businesses need scalable solutions that can handle growing content without performance degradation

We were inspired by the potential of TiDB Cloud's Vector Search to revolutionise how people access and interact with information, combined with the power of OpenAI's language models and real-time voice communication.

🎯 What it does

docvoice - Turn docs into voice transforms static websites into intelligent, voice-enabled knowledge systems:

Core Functionality

🕷️ Intelligent Web Crawling: Automatically crawls websites and extracts meaningful content
🔍 Semantic Vector Search: Uses TiDB Cloud's vector search for context-aware information retrieval
🤖 AI-Powered Q&A: Generates intelligent answers using OpenAI's GPT-4 model
🎤 Voice Interface: Natural voice conversations with AI agents
📱 Widget Integration: Easy-to-embed voice agents on any website
📊 Real-time Indexing: Live content processing and vector database updates

Use Cases

Customer Support: Instant voice-based support using company documentation
Developer Documentation: Voice search through technical documentation and APIs
Educational Content: Interactive learning through voice conversations
Knowledge Management: Intelligent search through internal company knowledge bases
E-commerce Support: Voice assistance for product information and FAQs

🏗️ How we built it

Architecture Overview

We built a dual-architecture system with frontend and backend components working together:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │    Backend      │    │   TiDB Cloud    │
│   (Next.js)     │◄──►│   (Python)      │◄──►│   Vector DB     │
│                 │    │                 │    │                 │
│ • Web Interface │    │ • Voice Agent   │    │ • Vector Search │
│ • Widget System │    │ • Speech-to-Text│    │ • Embeddings    │
│ • Content Mgmt  │    │ • AI Processing │    │ • Real-time     │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Frontend Development

Next.js 14: Modern React framework with App Router for optimal performance
TypeScript: Type-safe development ensuring code reliability
Tailwind CSS: Responsive design system for beautiful UI/UX
LiveKit Integration: Real-time voice communication components
Widget System: Iframe-based integration for easy website embedding

Backend Development

Python Backend: High-performance voice processing server
Deepgram Integration: Advanced speech-to-text capabilities
OpenAI API: GPT-4 integration for intelligent responses
LiveKit Server: Real-time communication infrastructure
TiDB Integration: Vector search and database operations

Vector Search Implementation

Content Chunking: Intelligent splitting of content into semantic chunks
OpenAI Embeddings: text-embedding-3-large for vector generation
TiDB Cloud: Enterprise-grade vector database with real-time indexing
Semantic Search: Context-aware information retrieval

Voice Processing Pipeline

Speech Input → Deepgram STT → Text
Text Query → OpenAI Embedding → Vector
Vector Search → TiDB Cloud → Relevant Content
Content Processing → OpenAI GPT-4 → Intelligent Response
Response → Text-to-Speech → Voice Output

📊 How docvoice Uses TiDB Cloud

docvoice leverages TiDB Cloud's vector search as its core AI knowledge system:

Database Schema & Storage:

enhanced_chunks: Stores website content chunks with 1536-dimensional vector embeddings
- content: Raw text chunks from web pages
- embedding VECTOR(1536): OpenAI text-embedding-3-large vectors for semantic search
- metadata JSON: Page titles, URLs, and indexing information
- url & page_title: Source attribution for answers
url_sources: Manages websites to be indexed
- indexing_mode: Simple vs enhanced crawling options
- max_pages & max_depth: Crawling limits and depth control
- chunks_count & pages_count: Indexing statistics
agents: Stores AI voice agent configurations
- personality, conversation_style, system_prompt
- llm_model, stt_model, tts_voice_id settings
- search_limit, context_window for response generation
agent_url_assignments: Links agents to their knowledge bases
- assignment_type: Primary vs secondary URL sources
- search_priority: Order of importance for content retrieval
indexing_jobs: Tracks website crawling progress and status

Vector Search Implementation:

Real-time Embedding: Content is chunked and embedded using OpenAI's latest model
Semantic Retrieval: Vector similarity search finds most relevant content chunks
Hybrid Search: Combines vector search with traditional text search for comprehensive results
Performance: Sub-second queries with TiDB Cloud's distributed architecture

TiDB Cloud Account: satyasandeep786@gmail.com

Built With

livekit
nextjs
openai
python
tidb

Updates

Shiva Kumar Mangina started this project — Sep 02, 2025 12:36 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.