TiDB.ML

Chat With ML Model

Inspiration

As a data scientist, I've consistently encountered a significant challenge in my projects. We develop sophisticated XGBoost or Deep Learning models for various business cases, meticulously testing and interpreting them during development using tools like SHAP. However, in production, end-users typically interact with these models through simple APIs that return only prediction values. This leaves a wealth of valuable insights untapped due to their highly technical nature and the lack of a modular process to effectively communicate these insights. This gap between the rich analytical capabilities of our models and their limited practical application in production environments inspired me to seek a solution.

What it does

Chat With ML Model is an innovative tool that bridges the gap between complex ML models and end-users. It allows users to interact with ML models using natural language queries, providing not just predictions but also in-depth insights and explanations. Key features include:

Natural language interface for querying ML models
Comprehensive insights beyond simple predictions
Integration of SHAP values for model interpretability
Using DiCE counterfactuals for personalized recommendations
Customer Lifetime Value (CLV) impact analysis
Churn Impact Analysis on treatments
Churn prediction and explanation
Feedback store for continuous improvement
Known Good SQL retrieval for efficient query generation
Create visualization and EDA on the go just by asking for what you need

How I built it

Core Components:
- Tool Calling Gemini Agent: Orchestrates the overall process
- TiDB Agent: Leverages chat2sql to generate and validate SQL queries
- Feedback Vector Store: Captures and utilizes user feedback for future queries
- Known Good SQL Retriever: Finds and reuses effective SQL queries from Vector Store using cosine similarity
Technology Stack:
- TiDB Serverless for efficient data storage and retrieval
- TiDB's chat2sql feature for SQL generation
- TiDB Vector database for storing and retrieving feedback and known good SQL
- Large Language Models (LLMs) for natural language processing
- XGBoost,Dice, SHAP, CatBoost for ML model development
- Streamlit for UI
Workflow:
- User submits a question
- System checks for similar previous queries in the feedback store
- If no match, question is reformulated for SQL compatibility
- Known Good SQL Retriever checks for similar existing queries
- If no match, SQL query is generated using chat2sql and validated
- Data is retrieved and agent passes it to required tools
- Results are presented to the user in natural language
- User feedback is collected and stored for future improvement

Challenges I ran into

SQL Query Generation Accuracy: Initially, ensuring accuracy and efficiency in SQL query generation was a concern. However, I found that TiDB's chat2sql conversion was remarkably accurate compared to other available options. It not only provided precise queries but also proved to be much faster in implementation, significantly streamlining our development process.
Designing Prompts for Tool Calling Agent: Crafting effective prompts for the tool calling agent was a complex task. I had to ensure that the prompts were clear enough to guide the agent but flexible enough to handle a wide range of user queries. This required multiple iterations and fine-tuning to achieve the right balance of specificity and versatility.
Implementing Feedback Store: Designing an effective system to capture, store, and utilize user feedback presented new challenges. I had to ensure that the feedback was properly embedded and indexed for quick retrieval, and that the system could effectively use this feedback to improve future responses.
Optimizing Known Good SQL Retrieval: Creating an efficient system for storing and retrieving known good SQL queries required careful consideration of indexing and similarity search algorithms. Balancing between query accuracy and retrieval speed was a significant challenge.
Integrating Multiple Components: Seamlessly connecting the various components of our system - from the natural language processing front-end to the TiDB backend, and including the ML model interpretation tools, feedback store, and known good SQL retriever - presented integration challenges that required careful planning and execution. Using streamlit as the UI, which reruns every time of interaction wasn't helping

Accomplishments I'm proud of

Successfully integrating TiDB Serverless with LLMs for powerful data querying
Creating a modular system that can accommodate various ML models
Developing a user-friendly interface for complex ML insights
Achieving high accuracy in SQL generation using TiDB's chat2sql feature
Implementing an effective feedback store for continuous system improvement
Creating a robust Known Good SQL retrieval system for query optimization

What I learned

The potential of combining TiDB Serverless with LLMs for enhanced data analysis
Techniques for translating technical ML insights into user-friendly explanations
The importance of modular design in creating flexible ML systems
Strategies for effective SQL generation and validation in natural language systems
The value of user feedback in improving and optimizing AI-driven systems
Techniques for efficient storage and retrieval of vector embeddings for similar query matching

What's next for Chat With ML Model

Enhancing the feedback store with more advanced NLP techniques for better similarity matching and adding constraints to feedback storage. Also adding a time filter on retrival process to make sure outdated responses are not streamed.
Expanding the Known Good SQL database with more complex query patterns
Developing industry-specific modules for targeted business applications
Using a better UI than streamlit as comlexity increases
Implementing more sophisticated user feedback mechanisms to continuously improve the system's accuracy and relevance
Investigating ways to automatically generate and validate new SQL patterns for the Known Good SQL database

NOTE:-