HackOps AI

Inspiration

We experienced the problem firsthand at LotusHacks 2026. We emailed the organizers, asked in the Discord #general channel - no reply. Participants were asking the same questions ("When's lunch?", "How do I submit?", "What's the WiFi?") across multiple channels, and organizers were either repeating themselves or missing questions entirely.

We looked at how enterprise systems solve this - CommsIQ by Centelli does AI email triage for hospitality, Wallu does Discord KB auto-answers - and realized no one combines triage intelligence with situational awareness and an insight-to-action loop. So we built HackOps AI.

What it does

HackOps AI is an AI triage bot for hackathon operations on Discord. It doesn't just answer questions - it triages the entire communication stream:

Auto-answers participant questions from a knowledge base built from websites, live schedules, and Discord message history - with source citations
Smart escalation - when it doesn't know, it escalates to organizers with full context including what it already tried. Organizer replies once → bot learns permanently
Duplicate detection - same question asked by 10 people → organizer answers once, bot handles the rest
Summarization & insights - /digest turns hundreds of messages into top topics, resolved vs unresolved, action items
Insight to action - /announce drafts announcements from data, organizer revises via reply, approves with a reaction. 10 seconds from insight to posted announcement that updates the KB
Knowledge management - URLs crawled with LLM extraction, Discord messages mined for Q&A, organizer answers and announcements auto-feed the KB

How we built it

Architecture: Clean adapter/engine separation - Discord is a thin adapter, the triage engine is platform-agnostic. Swap Discord for Slack/Teams/email with zero engine changes.
RAG pipeline: Puppeteer crawls event websites → LLM extracts structured knowledge → chunked and embedded into PostgreSQL with pgvector. Context-augmented retrieval (user's conversation history prepended to embedding query) replaces a separate query-rewriting LLM call.
Tiered LLM routing: Fast model (gpt-5.4-mini) for classification and vision. Strong model (gpt-5.4) for answers, digests, announcements, and knowledge extraction. Cuts cost ~10x on high-volume classification.
Escalation feedback loop: Organizer replies to escalation embeds → answer forwarded to participant → stored as verified KB entry with highest priority. Bot learns from every organizer interaction.
Seed pipeline: Bulk-inserts crawled Discord messages, then runs LLM extraction on announcement channels, Q&A forums, and #general conversations to bootstrap the KB from real data.
Stack: TypeScript, discord.js, OpenAI API, PostgreSQL + pgvector, Puppeteer, Docker Compose, Vitest.

Challenges we ran into

Knowledge quality: Raw text extraction from websites was messy — schedules, tabs, and accordions produced garbled content. We added an LLM extraction step that structures crawled content before embedding, which dramatically improved answer accuracy.
Conversation context pollution: With multiple users chatting in #general, the bot mixed up conversations. We built per-user context filtering - only the current user's messages and bot replies to them - to keep conversations isolated.
Escalation bot_msg_id bug: Organizer replies weren't being forwarded because the escalation embed's message ID wasn't stored in three different code paths (triage, reaction handler, bug reports). Each was the same pattern - createEscalation() called but ID not returned. Found through user testing, not caught by tests because the handler layer had zero test coverage.
Time hallucination: The bot said it was 18:52 when it was actually 15:54 - the LLM prompt included today's date but not the current time. Fixed by adding full datetime with Vietnam timezone.
Stale dedup cache: We stored escalated questions with an [Escalated to organizers] prefix for dedup. That prefix leaked into user-facing answers when similar questions came in later. Rewrote escalation dedup with clean text.

Accomplishments that we're proud of

The learning loop works end-to-end: Question → bot tries → escalates → organizer replies once → answer forwarded back → stored in KB → all future similar questions auto-answered. This is the core value proposition and it works reliably.
22 features shipped in ~20 build hours with a team of 3 - including features we initially scoped as stretch goals (dedup, announcements, KB refresh).
102 automated tests covering triage logic, escalation rollback, RAG error handling, formatter output, and handler integration.
Real data, real answers: The bot runs on actual LotusHacks 2026 data - real website, real schedule, real Discord history - not synthetic test data.
Clean architecture: The adapter/engine separation is real, not a slide. Handlers import zero LLM or DB functions directly. Swapping the Discord adapter for Slack would require zero engine changes.

What we learned

Test the integration layer, not just pure functions. Every bug we found in production was in the Discord handler layer, which had zero test coverage. Unit tests on the triage engine caught nothing because the wiring between Discord events and engine methods was where bugs lived.
LLM extraction > raw text crawling. Passing crawled HTML through an LLM to extract structured knowledge was the single biggest quality improvement - better than tuning embeddings, chunk sizes, or retrieval parameters.
Per-user context matters at scale. The naive approach (last 5 channel messages) breaks immediately with multiple users. Filtering to the current user's conversation thread was essential for corrections ("I mean prize, sorry") and follow-ups to work.
Ship the feedback loop first. The organizer-reply-to-KB-entry loop is what makes the system get smarter over time. Everything else (dedup, digests, announcements) is nice-to-have by comparison.

What's next for HackOps AI

Slack and Teams adapters - the engine is already platform-agnostic, we just need thin adapters for each platform
Enterprise verticals - IT helpdesk triage, hotel guest support, customer support widget - same triage engine, different adapter
Community reaction voting - multiple ❌ from different users = stronger escalation signal
Conversation monitoring - LLM auto-evaluates whether a conversation is truly resolved without explicit reactions
File ingestion - PDF/DOCX upload to KB via Discord