Slackcident

Slackcident - TiDB with pinch of AI
Going to slack using triage command with the log text and getting more details from the past
Appops engineer monitor the logs table and copying the recent log
Using runbook command getting further more details about the root-cause and fixes

TiDB Cloud account Email: sugumar.p@gmail.com

Inspiration

Why does fixing an incident feel like detective work every single time? If you've worked in AppOps, you know the drill: Alert pops up in Slack You copy-paste logs into a search tool Hunt for that one confluence page from 2021 Ping 3 different engineers for "tribal knowledge" Finally fix it… but lose an hour (or more) in the process

What it does

💡 What if Slack itself could tell us: "Hey, I've seen this before - here's how to fix it, and here's the Jira ticket button"?

How we built it

/triage <query>: Find similar incidents and get AI-generated triage notes.
/runbook <incident-id>: Show runbook and details for an incident.
One-click Jira ticket creation from Slack.
Hybrid search using OpenAI embeddings (or local mock embeddings).
Works with TiDB Cloud for scalable, vector-enabled storage.

Challenges we ran into

Vector Index Requirements: TiDB requires a TiFlash (columnar) replica before creating vector indexes, which led to errors until we updated the schema to set the replica first.
OpenAI API Quota: We hit OpenAI API quota limits, so we added support for local/mock embeddings to allow development without a paid API key.
ngrok Authentication: ngrok now requires an authtoken and a verified account, which added extra steps for local Slack integration.

Accomplishments that we're proud of

Ops teams shouldn't spend hours chasing fixes they've solved before. With TiDB Serverless, Slack, and a pinch of AI, we turned alerts into guided, automated incident response - all without leaving chat. Because the fastest way to fix something… is to remember you've already fixed it before.

What we learned

What We Learnt

Cloud-native DBs require adaptation: Not all MySQL features are available in TiDB; understanding cloud-native database constraints is crucial for smooth integration.
Modern search needs vector support: Implementing hybrid search with vector indexes and AI embeddings is powerful, but requires careful schema and infra setup.
API limits matter: Building fallback logic (like mock embeddings) is important for development when external API quotas are hit.
Security and connectivity: Secure connections (SSL) and proper environment configuration are essential for cloud DBs.
Slack app integration is nuanced: Setting up Slack bots involves permissions, event subscriptions, and endpoint exposure—each with its own learning curve.
Iterative debugging: Many issues (from schema to API to Slack) required iterative troubleshooting and reading docs, reinforcing the value of patience and persistence.

What's next for Slackcident

Currently, monitors logs manually. Automating this with a bot that monitors logs, triggers triage and runbooks, and notifies the AppOps team would make the process much more efficient.

Built With

fastapi
genai
python
tidb

Submitted to

TiDB AgentX Hackathon 2025

Created by

I focused on bringing the different pieces of this project together:
Exploring TiDB Serverless — understanding how to use both vector search and full-text search within the same database to power hybrid incident lookup.
Python Development — building the FastAPI backend, search service, Slack routes, and integrations with Jira.
AI Integration — wiring OpenAI to summarize incidents into concise triage notes, while ensuring the system falls back gracefully when no LLM is available.
This mix of database, backend, and AI work helped us deliver an end-to-end AppOps assistant inside Slack.

Sugumar Panneerselvam

Updates

Sugumar Panneerselvam started this project — Aug 27, 2025 09:17 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.