🧠 Inspiration
I chose to build Multi-Agent Governance Middleware because in my experience working with LLMs, even well-aligned models occasionally generate unsafe or non-compliant outputs. While many think only permissive models are risky, I've seen even stricter models (like Claude, which I've used heavily for over 2 years) unexpectedly generate inappropriate content — including an F-word in a recent test. That sparked the need to build a protective layer that can audit LLM output in real time.
The idea was simple: what if multiple specialized agents could each assess output from their own lens — legal, ethical, safety, privacy, compliance — and collectively produce a governance verdict?
🛠️ How I Built It
I built the project using Claude Sonnet (v4) via CLINE with support from fast APIs and serverless deployment. The system consists of:
- A core FastAPI backend that receives LLM outputs
- Five agents: Platform Compliance, Data Privacy, Legal Risk, Standards & Quality, AI Risk & Ethics — each evaluating independently
- A governance engine that aggregates agent scores, highlights violations, and recommends severity
For testing, I used a basic /demo UI and also exposed a clean REST endpoint at /govern for API-based validation.
🤯 Challenges
The build was rapid — I started on July 9th evening and finished by July 10th evening. While the architecture was simple and functional, debugging consumed 10x more time than expected.
- I forgot to manually verify file logic during initial tests, which led to chain failures
- Ensuring consistent and explainable outputs from agents under tight constraints was tricky
- Without time to build a full RAG-based knowledge fetcher for official compliance documentation, I relied on prompt engineering for now — but aim to integrate RAG soon
Despite the crunch, I tested 50+ outputs, and 47 returned accurate results, validating the core approach.
👨💻 What I Learned
- Multi-agent evaluation improves interpretability and safety of LLM outputs
- Even with one person, you can build and deploy impactful AI middleware in under 24 hours
- Prompt design matters a lot when you're not using fine-tuning or RAG
- Governance is not a one-size-fits-all — having granular agent feedback is more actionable than a flat "pass/fail"
Built With
- amazon-web-services
- awslambda
- claudesonnet4
- fastapi
- python
Log in or sign up for Devpost to join the conversation.