Inspiration

The rise of IoT and sensor-driven applications highlighted the need for a system that can process data streams in real-time, detect anomalies quickly, and make insights accessible in natural ways. We wanted to combine vector search, anomaly detection, and conversational AI into a single pipeline that not only monitors but also interacts with users seamlessly through Slack.

What it does

  • Ingests streaming sensor readings (simulated with Python).
  • Stores records in TiDB with 1536-dimension embeddings for semantic similarity.
  • Detects anomalies based on bootstrap rules and cosine distance thresholds.
  • Triggers Slack alerts instantly when anomalies are detected.
  • Provides a Slack bot where users can request visualizations, KPIs, or insights.
  • Uses LLMs to classify queries into semantic or direct SQL, and returns results as charts or numbers.

How we built it

  • Simulated streaming data using Python to mimic real-time sensor readings.
  • Ingested the data into TiDB, generating embeddings and creating an HNSW-based vector index for fast similarity searches.
  • Defined bootstrap rules for anomaly detection and built logic to compare embeddings with cosine distance.
  • Integrated Slack for real-time alerts and built a Slack bot for interactive queries.
  • Leveraged LLMs for query classification, embedding generation, and semantic similarity checks to retrieve results from TiDB.
  • Delivered results back to Slack in user-friendly formats like charts and KPI summaries.

Challenges we ran into

  • Designing bootstrap rules that are flexible yet effective for anomaly detection.
  • Optimizing vector similarity search for both speed and accuracy using high-dimensional embeddings.
  • Handling real-time ingestion and ensuring anomaly alerts are triggered with minimal latency.
  • Seamlessly integrating Slack with both alerting and conversational query workflows.
  • Ensuring that the LLM correctly classifies queries and generates meaningful responses.

Accomplishments that we're proud of

  • Built an end-to-end pipeline combining streaming ingestion, vector search, anomaly detection, and conversational analytics.
  • Successfully integrated TiDB’s vector index with HNSW to perform efficient similarity checks on embeddings.
  • Created a Slack bot that not only delivers anomaly alerts but also answers user queries with charts and KPIs.
  • Demonstrated a scalable solution that can be extended to real-world IoT and monitoring use cases.

What we learned

  • How to leverage TiDB’s vector index for real-time semantic similarity searches.
  • Best practices for embedding generation and anomaly detection using cosine distance.
  • Practical integration of LLMs for query understanding and classification.
  • The importance of user experience—delivering insights directly in Slack makes the system far more accessible.

What's next for Real-Time Data Monitoring & Interactive Analytics with TiDB

  • Expand the anomaly detection framework with adaptive thresholds and ML-based models.
  • Integrate real-world sensor data streams instead of simulation.
  • Enhance the Slack bot with richer visualizations and natural language explanations.
  • Add support for more collaboration platforms (e.g., Teams, Discord).
  • Scale the architecture to handle larger datasets and higher ingestion rates.
Share this project:

Updates