Inspiration

We were inspired by how real organisations, not single geniuses, solve gnarly problems. When a disaster hits, insurance companies don’t “ask one person”; they mobilise a whole hierarchy of specialists with different incentives and checks. We asked: what if agentic AI systems looked more like that? And how do you keep them observable in a post-Covid, remote-first world? That led us to copy the enterprise itself—org chart, roles, and Slack—rather than just the “AI brain.”

What it does

CRD2 – Elemental Insurance Limited is a simulated claims department made of specialist AI agents that only communicate through Slack. We model a Claims Manager, Investigator, and Claims Adjuster, each with their own goals and biases, working together on complex insurance claims. All reasoning is done in public Slack channels so humans can drop in, review discussions, ask questions, and steer outcomes in real time. On top, we add a robustness layer called G-Brace to defend against prompt injection and malicious inputs.

How we built it

We designed a small “enterprise blueprint” for an insurance claims department, then turned each role into a dedicated LLM-powered agent with its own system prompt, objectives, and constraints. We wired these agents into a Slack workspace so that every decision flows through channels and threads, giving us a live audit trail. A simple orchestration layer routes new cases into Slack, triggers the right agents, and logs all interactions for analysis. Finally, we built a red-team harness around the system so we could attack our own agents and iterate on the G-Brace defence.

Challenges we ran into

Getting agents to really behave like their roles (and not collapse into one generic assistant) was harder than expected. Balancing autonomy with strict “talk only in Slack” constraints sometimes made agents feel slow or stuck. Prompt injection and subtle jailbreaking attempts were surprisingly effective at first, forcing us to rethink how we structure context and guardrails. We also had to tune how much context each agent sees so conversations stayed coherent without blowing up token limits.

Accomplishments that we're proud of

We shipped an end-to-end, live demo where a human can drop a messy claim into Slack and watch a whole mini-organisation spin up to handle it. Our agents genuinely disagree sometimes—the Investigator pushing on fraud while the Adjuster worries about customer satisfaction and legal risk—which makes the system feel more “organisational” than “chatbot.” We’re proud of G-Brace, our first stab at a reusable pattern for hardening chat-native agents against attacks. And honestly, turning Slack into a control room for a fictional insurance company in a weekend was just fun.

What we learned

Structure matters: the org chart, roles, and communication patterns are just as important as the base model. For observability, forcing everything through a shared medium (Slack) is incredibly powerful—suddenly you can see why decisions were made, not just the final answer. We also learned that robustness is not a one-off setting; it’s a continuous process of red-teaming, tightening prompts, and adjusting incentives. And finally, humans in the loop aren’t just “safety valves”—they can be active collaborators inside the agent organisation.

What’s next for CRD2 – Elemental Insurance Limited

Next, we want to plug Elemental into more realistic workflows: real (anonymised) claims data, document ingestion, and policy rules. We plan to expand the org with more roles (legal, reinsurance, customer success) and richer metrics for observability—dashboards of disagreements, escalation patterns, and risk scores. On the robustness side, we’ll evolve G-Brace into a more formal, testable framework for evaluating and hardening agentic systems. Longer term, we see this pattern as reusable: swap “insurance” for any complex domain and use Slack-native, hierarchical agents to keep powerful AI both robust and glass-box.

Built With

Share this project:

Updates