Chat With ML Model

Inspiration

As a data scientist, I've consistently encountered a significant challenge in my projects. We develop sophisticated XGBoost or Deep Learning models for various business cases, meticulously testing and interpreting them during development using tools like SHAP. However, in production, end-users typically interact with these models through simple APIs that return only prediction values. This leaves a wealth of valuable insights untapped due to their highly technical nature and the lack of a modular process to effectively communicate these insights. This gap between the rich analytical capabilities of our models and their limited practical application in production environments inspired me to seek a solution.

What it does

Chat With ML Model is an innovative tool that bridges the gap between complex ML models and end-users. It allows users to interact with ML models using natural language queries, providing not just predictions but also in-depth insights and explanations. Key features include:

  • Natural language interface for querying ML models
  • Comprehensive insights beyond simple predictions
  • Integration of SHAP values for model interpretability
  • Using DiCE counterfactuals for personalized recommendations
  • Customer Lifetime Value (CLV) impact analysis
  • Churn Impact Analysis on treatments
  • Churn prediction and explanation
  • Feedback store for continuous improvement
  • Known Good SQL retrieval for efficient query generation
  • Create visualization and EDA on the go just by asking for what you need

How I built it

  1. Core Components:

    • Tool Calling Gemini Agent: Orchestrates the overall process
    • TiDB Agent: Leverages chat2sql to generate and validate SQL queries
    • Feedback Vector Store: Captures and utilizes user feedback for future queries
    • Known Good SQL Retriever: Finds and reuses effective SQL queries from Vector Store using cosine similarity
  2. Technology Stack:

    • TiDB Serverless for efficient data storage and retrieval
    • TiDB's chat2sql feature for SQL generation
    • TiDB Vector database for storing and retrieving feedback and known good SQL
    • Large Language Models (LLMs) for natural language processing
    • XGBoost,Dice, SHAP, CatBoost for ML model development
    • Streamlit for UI
  3. Workflow:

    • User submits a question
    • System checks for similar previous queries in the feedback store
    • If no match, question is reformulated for SQL compatibility
    • Known Good SQL Retriever checks for similar existing queries
    • If no match, SQL query is generated using chat2sql and validated
    • Data is retrieved and agent passes it to required tools
    • Results are presented to the user in natural language
    • User feedback is collected and stored for future improvement

Challenges I ran into

  1. SQL Query Generation Accuracy: Initially, ensuring accuracy and efficiency in SQL query generation was a concern. However, I found that TiDB's chat2sql conversion was remarkably accurate compared to other available options. It not only provided precise queries but also proved to be much faster in implementation, significantly streamlining our development process.

  2. Designing Prompts for Tool Calling Agent: Crafting effective prompts for the tool calling agent was a complex task. I had to ensure that the prompts were clear enough to guide the agent but flexible enough to handle a wide range of user queries. This required multiple iterations and fine-tuning to achieve the right balance of specificity and versatility.

  3. Implementing Feedback Store: Designing an effective system to capture, store, and utilize user feedback presented new challenges. I had to ensure that the feedback was properly embedded and indexed for quick retrieval, and that the system could effectively use this feedback to improve future responses.

  4. Optimizing Known Good SQL Retrieval: Creating an efficient system for storing and retrieving known good SQL queries required careful consideration of indexing and similarity search algorithms. Balancing between query accuracy and retrieval speed was a significant challenge.

  5. Integrating Multiple Components: Seamlessly connecting the various components of our system - from the natural language processing front-end to the TiDB backend, and including the ML model interpretation tools, feedback store, and known good SQL retriever - presented integration challenges that required careful planning and execution. Using streamlit as the UI, which reruns every time of interaction wasn't helping

Accomplishments I'm proud of

  1. Successfully integrating TiDB Serverless with LLMs for powerful data querying
  2. Creating a modular system that can accommodate various ML models
  3. Developing a user-friendly interface for complex ML insights
  4. Achieving high accuracy in SQL generation using TiDB's chat2sql feature
  5. Implementing an effective feedback store for continuous system improvement
  6. Creating a robust Known Good SQL retrieval system for query optimization

What I learned

  1. The potential of combining TiDB Serverless with LLMs for enhanced data analysis
  2. Techniques for translating technical ML insights into user-friendly explanations
  3. The importance of modular design in creating flexible ML systems
  4. Strategies for effective SQL generation and validation in natural language systems
  5. The value of user feedback in improving and optimizing AI-driven systems
  6. Techniques for efficient storage and retrieval of vector embeddings for similar query matching

What's next for Chat With ML Model

  1. Enhancing the feedback store with more advanced NLP techniques for better similarity matching and adding constraints to feedback storage. Also adding a time filter on retrival process to make sure outdated responses are not streamed.
  2. Expanding the Known Good SQL database with more complex query patterns
  3. Developing industry-specific modules for targeted business applications
  4. Using a better UI than streamlit as comlexity increases
  5. Implementing more sophisticated user feedback mechanisms to continuously improve the system's accuracy and relevance
  6. Investigating ways to automatically generate and validate new SQL patterns for the Known Good SQL database

NOTE:-

  • There are few features like visialization, model stat check etc. that couldn't be added in demo video due to time constraint.

Built With

Share this project:

Updates