AgentGuard Lite - AI Agent Governance Studio

Readiness check tab
Scorecard showing 10/25 Not Production Ready
Cost Calculator tab
Cost breakdown
Scenario Tester tab
History tab
Execution result

Inspiration

80-95% of enterprise AI PoCs fail to reach production. Working on AgentGuard — a Python SDK for AI agent observability — we realized the governance problem is not just technical. Delivery managers, project leads, and non-technical stakeholders need to understand agent readiness too. We built AgentGuard Lite to make AI governance accessible to every team member, not just engineers.

What it does

AgentGuard Lite is a four-screen AI agent governance studio for enterprise teams:

Readiness Check — An 8-question assessment that generates a deterministic Production Readiness Scorecard across 5 dimensions: Observability, Cost Control, Governance, Testing, and Data Security. Each dimension is scored by rule-based logic (same input always gives same score), with AI-generated recommendations. Includes an Indian IT Context section flagging data residency compliance for BFSI and government clients.

Cost Calculator — Estimates daily and monthly costs for AI agent fleets across GPT-4o, Claude Sonnet, Llama 3.1, and Gemini Flash. Shows costs in both USD and INR. Triggers a budget warning when projected monthly cost exceeds $100.

Scenario Tester — Simulates agent behavior using MeDo's LLM skill, validates output against expected keywords, and checks estimated cost against a configured budget limit. Shows PASS/FAIL with keyword-level analysis.

History — Saves every assessment to localStorage. Teams can track governance score improvements over time. Shows total assessments and average score across all runs.

How we built it

Built entirely with MeDo's Deep Build mode in one session:

Submitted a single detailed prompt describing all three screens, navigation, LLM integration, scoring logic, and design requirements
MeDo generated a requirements document for confirmation before building
MeDo generated the complete full-stack application including navigation, form logic, LLM skill integration, calculation engine, chart rendering, and responsive design
MeDo detected and automatically fixed a streaming API error in the LLM integration layer
Follow-up prompts added: deterministic scoring rules, Indian IT Context section, and History tab with localStorage persistence

The most impressive MeDo capability used: the LLM skill integration that takes 8 free-form answers and returns structured dimension scores with specific recommendations in real time.

Challenges we ran into

MeDo initially used LLM-generated scores which were non-deterministic. Solved by adding rule-based scoring logic through a follow-up prompt while keeping the LLM for recommendation text only.
The Governance rule needed case-insensitive matching for "Nobody" vs "nobody". Fixed with a targeted fix prompt.
Balancing depth of features against credit consumption with limited credits available.

Accomplishments that we're proud of

Deterministic scoring that gives consistent, trustworthy results — not random LLM outputs
Indian IT Context section that directly addresses data residency requirements for BFSI and government clients in India
History tracking that turns a one-time check into a governance practice
The entire app was built, debugged, and extended through conversation with MeDo — zero manual coding

What we learned

MeDo's Deep Build mode is genuinely capable of generating production-grade application logic, not just UI scaffolding. The key is giving it a precise, structured prompt with explicit business rules. Vague prompts produce vague apps. Specific prompts produce specific apps.

Multi-turn iteration is MeDo's most powerful feature — being able to extend and fix a generated app through follow-up conversation without rebuilding from scratch.

What's next for AgentGuard Lite

Export scorecard as PDF for client reporting
Team sharing — save assessments to a shared workspace, not just localStorage
Connect to the AgentGuard Python SDK for real agent trace data instead of simulated responses
Benchmark database — compare your agent's score against industry averages for your sector

Built With

javascript
llm
medo
supabase

Updates

Abinandida R started this project — May 20, 2026 06:46 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.