Judge hook

This is a Slack agent safety and governance demo, not just another inbox summarizer. It proves the agent can classify noisy workspace messages, block unauthorized egress, repair malformed model output, and explain its decisions in a repeatable local demo with 35 passing tests.

Inspiration

Slack agents fail in two boring ways. They call APIs they should not (because the agent forgot to check its OAuth scope), or they exfiltrate data to hosts they should not (because the agent fetches whatever the LLM asks it to). Most demos handwave both. I wanted the small, boring reference where governance is the load-bearing design, not a bullet point on the README. The agent should refuse to call chat.postMessage on a channel it does not have the scope for, and it should refuse to fetch from any host that is not on the allowlist, with both denials captured in the audit log.

What it does

The /triage command classifies a channel backlog into recruiter, customer-support, internal-request, or noise categories with a confidence score and a drafted reply per message. Every Slack API call routes through a scope allowlist that refuses any call whose required scope is not in the manifest. Every outbound HTTP routes through a host allowlist that refuses unlisted hosts and writes a blocked row to the audit JSONL. Tool args are validated before the call; malformed args get an LLM-friendly retry hint. Output is schema-validated. Ships with a FakeSlackProvider so the demo runs offline without a Slack workspace.

How I built it

Pure Python 3.10+, governance composed through governance.py. 35 tests run in under a second. The Slack app manifest is committed at the repo root so reviewers can install in their own workspace. The scope allowlist parses the manifest once at startup, so the governance config is the same artifact a Slack admin reads when reviewing the app's permissions.

Challenges I ran into

Building the scope allowlist as data-driven (parse slack_app_manifest.yml once) instead of hand-listing was the unlock. Hand-listing scopes drifts from the manifest the admin actually reviews; parsing the manifest keeps the two in lockstep. The second hard part was the schema-validated output: early prompts let the model emit free-form classifications that the downstream code had to coerce. Switched to strict schema validation with a one-retry contract on schema failure.

Accomplishments I am proud of

In the demo, the agent silently refuses a chat.postMessage to a channel it does not have access to, with the denial captured in the audit log. The scope allowlist parsing means a Slack admin can read one file (the manifest) to know exactly what the agent can and cannot do. The 35 passing tests cover the governance surface explicitly so a reviewer can verify the safety claims without running the demo.

What I learned

Slack governance is mostly about reading the manifest as the source of truth. Once the scope allowlist is data-driven, the rest of the governance falls into place. The egress allowlist is the second load-bearing piece because LLM tool calls can fan out arbitrarily; default-denying unlisted hosts is the right posture. The audit log is the artifact that turns the governance from a promise into a checkable claim.

What is next for slack-inbox-triage

A second persona that drafts the morning standup digest from yesterday's channel activity. Multi-workspace mode so the same agent can triage backlog across two or three orgs in one /triage call. Integration with the existing gemini-channels-agent for cross-tool consistency.

Built With

Share this project:

Updates