🧠 Inspiration

I chose to build Multi-Agent Governance Middleware because in my experience working with LLMs, even well-aligned models occasionally generate unsafe or non-compliant outputs. While many think only permissive models are risky, I've seen even stricter models (like Claude, which I've used heavily for over 2 years) unexpectedly generate inappropriate content — including an F-word in a recent test. That sparked the need to build a protective layer that can audit LLM output in real time.

The idea was simple: what if multiple specialized agents could each assess output from their own lens — legal, ethical, safety, privacy, compliance — and collectively produce a governance verdict?

🛠️ How I Built It

I built the project using Claude Sonnet (v4) via CLINE with support from fast APIs and serverless deployment. The system consists of:

  • A core FastAPI backend that receives LLM outputs
  • Five agents: Platform Compliance, Data Privacy, Legal Risk, Standards & Quality, AI Risk & Ethics — each evaluating independently
  • A governance engine that aggregates agent scores, highlights violations, and recommends severity

For testing, I used a basic /demo UI and also exposed a clean REST endpoint at /govern for API-based validation.

🤯 Challenges

The build was rapid — I started on July 9th evening and finished by July 10th evening. While the architecture was simple and functional, debugging consumed 10x more time than expected.

  • I forgot to manually verify file logic during initial tests, which led to chain failures
  • Ensuring consistent and explainable outputs from agents under tight constraints was tricky
  • Without time to build a full RAG-based knowledge fetcher for official compliance documentation, I relied on prompt engineering for now — but aim to integrate RAG soon

Despite the crunch, I tested 50+ outputs, and 47 returned accurate results, validating the core approach.

👨‍💻 What I Learned

  • Multi-agent evaluation improves interpretability and safety of LLM outputs
  • Even with one person, you can build and deploy impactful AI middleware in under 24 hours
  • Prompt design matters a lot when you're not using fine-tuning or RAG
  • Governance is not a one-size-fits-all — having granular agent feedback is more actionable than a flat "pass/fail"

Built With

Share this project:

Updates

posted an update

UPDATE

AI GOVERNANCE MIDDLEWARE

For some reason, I found new interest in the project and started working on it again—yeah, I know this doesn’t count for the original submission—but I ended up building MVP 1.0 that actually works.

What's New?

I’ve added support for five major providers:

  • OpenAI
  • Anthropic
  • AWS Bedrock
  • Google Vertex AI
  • Gemini

It’s fully expanded and testable now. Users can bring their own API keys and plug them in to test requests across multiple LLM providers.

Key Features

  • Functional and live:
    https://5viugclv4yi2jrgt5dlcjxe6se0ntoto.lambda-url.us-east-1.on.aws/

  • Focused on security and privacy:
    No user dashboard, no account linking. Users can create their own API keys just by hitting the endpoint. Everything is included in the docs.

  • Stateless by design:
    Except for storing API keys, which are encrypted and stored in DynamoDB (since AWS Lambda doesn’t support writes).

  • Minimalist, privacy-first architecture:
    The whole system is designed without tracking, dashboards, or analytics—just clean middleware to help you test, evaluate, or build with AI providers under your own terms.

More improvements coming soon.

Note:
The [link I submitted earlier for the hackathon] and the GitHub repo remain unchanged. This new work was done recently and is not part of the official submission.

Log in or sign up for Devpost to join the conversation.

posted an update

Note Regarding Submission Details

I would like to clarify a couple of minor issues related to the project submission:

  • Project Name: The submitted project name may appear random. This was initially intended to be a placeholder for the team name. Due to time constraints during submission, it was not updated and was mistakenly used as the project title.

  • Description Typo: In the project description, "CLI" was mentioned, which was an error introduced by the AI tool I used. The correct term should be CLINE, not CLI.

These were unintentional oversights, and I appreciate your understanding. The project content itself remains unaffected and accurate.

Log in or sign up for Devpost to join the conversation.