TiDB Cloud account Email: sugumar.p@gmail.com

Inspiration

Why does fixing an incident feel like detective work every single time? If you've worked in AppOps, you know the drill: Alert pops up in Slack You copy-paste logs into a search tool Hunt for that one confluence page from 2021 Ping 3 different engineers for "tribal knowledge" Finally fix it… but lose an hour (or more) in the process

What it does

💡 What if Slack itself could tell us: "Hey, I've seen this before - here's how to fix it, and here's the Jira ticket button"?

How we built it

  • /triage <query>: Find similar incidents and get AI-generated triage notes.
  • /runbook <incident-id>: Show runbook and details for an incident.
  • One-click Jira ticket creation from Slack.
  • Hybrid search using OpenAI embeddings (or local mock embeddings).
  • Works with TiDB Cloud for scalable, vector-enabled storage.

Challenges we ran into

  1. Vector Index Requirements: TiDB requires a TiFlash (columnar) replica before creating vector indexes, which led to errors until we updated the schema to set the replica first.
  2. OpenAI API Quota: We hit OpenAI API quota limits, so we added support for local/mock embeddings to allow development without a paid API key.
  3. ngrok Authentication: ngrok now requires an authtoken and a verified account, which added extra steps for local Slack integration.

Accomplishments that we're proud of

Ops teams shouldn't spend hours chasing fixes they've solved before.  With TiDB Serverless, Slack, and a pinch of AI, we turned alerts into guided, automated incident response - all without leaving chat. Because the fastest way to fix something… is to remember you've already fixed it before.

What we learned

What We Learnt

  • Cloud-native DBs require adaptation: Not all MySQL features are available in TiDB; understanding cloud-native database constraints is crucial for smooth integration.
  • Modern search needs vector support: Implementing hybrid search with vector indexes and AI embeddings is powerful, but requires careful schema and infra setup.
  • API limits matter: Building fallback logic (like mock embeddings) is important for development when external API quotas are hit.
  • Security and connectivity: Secure connections (SSL) and proper environment configuration are essential for cloud DBs.
  • Slack app integration is nuanced: Setting up Slack bots involves permissions, event subscriptions, and endpoint exposure—each with its own learning curve.
  • Iterative debugging: Many issues (from schema to API to Slack) required iterative troubleshooting and reading docs, reinforcing the value of patience and persistence.

What's next for Slackcident

  • Currently, monitors logs manually. Automating this with a bot that monitors logs, triggers triage and runbooks, and notifies the AppOps team would make the process much more efficient.

Built With

Share this project:

Updates