Inspiration
Enterprise AI adoption faces a critical challenge: how do you deploy AI agents at scale while maintaining governance, security, and compliance?
As organizations rush to integrate LLMs like Gemini into production systems, they encounter real-world barriers:
- No centralized control over which agents can access what tools
- Missing audit trails for AI decisions in regulated industries (finance, healthcare, government)
- No emergency kill-switch when agents malfunction or go rogue
- Difficulty managing agent-to-agent collaboration at scale
- Complex multi-cloud deployment requirements across GKE, AKS, and EKS
- API keys and credentials baked into code instead of runtime configuration
We built RAVP (Regulated Agent Vending Platform) to solve this: a production-grade platform where you can create, govern, and deploy Gemini-powered AI agents with enterprise controls baked in from day one. Think of it as "Kubernetes for AI Agents" - a control plane that manages the full lifecycle of governed, auditable, killable agents.
What it does
RAVP is a governed agent orchestration platform powered by Google Gemini that enables enterprises to:
Core Capabilities
- Create AI Agents - Define agents via intuitive Streamlit UI or API with purpose, allowed tools, risk tier, and policies
- Enforce Governance - Built-in RBAC, Rego policy evaluation, comprehensive audit logging, and emergency kill-switch
- Deploy Anywhere - One-click deployment to GKE, AKS, EKS, or local Docker with auto-generated Kubernetes manifests
- Agent Mesh - Agents discover and invoke each other through capability-based routing (A2A protocol)
- Runtime LLM Config - Switch between Google AI Studio, Vertex AI, OpenAI, or Anthropic per environment without rebuilding images
- Skills Framework - Structured capability system for agent discovery and routing
Real-World Example
Cloud Reliability Agent powered by Gemini:
- Monitors GCP incidents via Cloud Monitoring API
- Uses Gemini to analyze logs and metrics for root cause diagnosis
- Evaluates Rego policies: "Can we auto-remediate this issue?"
- Invokes Cloud Healing Agent to execute fixes (with human-in-the-loop approval)
- Every action is audited; emergency kill-switch available if needed
Key Features
- Agent Registry: Versioned agent definitions with semantic versioning
- Tool Gateway: Enforces that agents can only call allowed tools
- Policy Engine: Rego-based decision evaluation before critical actions
- Audit Store: Immutable log of all agent actions for compliance
- Kill-Switch: Circuit breaker to disable agents or models instantly
- Multi-Provider LLM: Unified API for Gemini (AI Studio + Vertex AI), OpenAI, Anthropic
- Conversation History: Full context retention for interactive agents
- Auto-Codegen: Generate agent implementation from YAML definition
- MCP Support: Model Context Protocol server for external tool consumers
How we built it
Architecture
1. Control Plane (FastAPI) The central nervous system managing all platform services:
- Agent Registry: Stores versioned agent definitions with RBAC (creator, visibility, domains)
- Tool Registry: Managed catalog of allowed tools per agent with versioning
- Policy Registry: Rego policy evaluation engine
- Audit Store: Immutable log of all agent decisions and tool calls
- Kill-Switch Service: Emergency disable for agents or models
- Deployment Manager: Records and tracks multi-cloud deployments
- Mesh Discovery: Agent-to-agent capability routing and invocation
- Docker Build Service: Auto-build and push agent images to registries
2. Agent SDK (Python) The enforcement layer that every agent uses:
from org_agent_sdk import RegulatedAgent
agent = RegulatedAgent(
agent_id="cloud_reliability",
control_plane_url="http://localhost:8010"
)
# SDK enforces:
# - Kill-switch check before every run
# - Tool calls only from allowed_tools list
# - Policy evaluation before critical decisions
# - Audit logging of all actions
Core components:
RegulatedAgent: Base class enforcing governanceToolGateway: Restricts tools to allowed listPolicyClient: Evaluates Rego policies via control planeAuditClient: Logs all tool calls asynchronouslyLLMClient: Multi-provider abstractionAgentClient: Mesh discovery and A2A invocation
3. Gemini Integration Gemini is the reasoning engine powering agent intelligence:
# Flexible Gemini configuration
from org_agent_sdk.llm_client import LLMClient
llm = LLMClient(
model="gemini-2.0-flash-exp",
api_key=os.getenv("GOOGLE_API_KEY"),
provider=os.getenv("LLM_PROVIDER", "google"), # or vertex_ai
endpoint=os.getenv("GOOGLE_API_ENDPOINT"), # optional custom endpoint
project=os.getenv("GOOGLE_CLOUD_PROJECT") # for Vertex AI
)
# Gemini function calling for tool invocation
tools_schema = gateway.get_tools_schema() # Convert to Gemini format
response = llm.generate(
prompt=user_query,
tools=tools_schema,
system_instruction=agent.purpose
)
Gemini Features We Leverage:
- Function Calling: Native tool invocation with structured schemas
- Streaming: Real-time response streaming for interactive agents
- System Instructions: Inject agent purpose, skills, and constraints
- Long Context: Analyze full incident logs and stack traces
- Multi-turn: Maintain conversation history for iterative problem-solving
- Vertex AI: GCP-native deployment for enterprise customers
4. Platform UI (Streamlit) No-code interface for agent lifecycle:
- Create Agent: Form-based agent definition with model dropdown (including Gemini models)
- My Agents: Personal agent list with edit/deploy actions
- Browse Agents: Agents grouped by domain (Payments, Cloud, Fraud, etc.)
- Deploy Agent: Multi-cloud deployment wizard with LLM runtime config
- Manage Tools/Policies: Admin interface for registry management
- Version History: Semantic versioning with changelog per agent
- Interactive Chat: Test agents with conversation history
5. Multi-Cloud Deployment
- Auto-generated Dockerfiles: Per-agent containerization
- Kubernetes Manifests: ConfigMaps, Secrets, Deployments, Services
- Cloud Support: GKE (Google), AKS (Azure), EKS (AWS)
- Runtime Config Injection: LLM credentials passed as env vars, not baked into images
- Helm Charts: For complex multi-agent deployments
6. Tools Layer Domain-specific tools that agents invoke:
mcp_gcp_tools: GCP incident management, log analysis, metricsmcp_payment_tools: Payment exceptions, retry logicmcp_customer_tools: Customer profiles, payment historymcp_fraud_tools: Fraud scoring, risk assessmentmcp_healing_tools: Auto-remediation actions (restart VMs, scale resources)
Tools can call existing APIs/Apigee proxies with proper auth.
Tech Stack
- LLM: Google Gemini (AI Studio + Vertex AI), OpenAI, Anthropic
- Backend: Python 3.9+, FastAPI, Pydantic
- Frontend: Streamlit
- Policy Engine: Open Policy Agent (Rego)
- Orchestration: Kubernetes, Docker
- Storage: File-based registry (extensible to Cloud Storage, PostgreSQL)
- Protocols: REST, Agent-to-Agent (A2A), Model Context Protocol (MCP)
- CI/CD: GitHub Actions, Cloud Build
Development Process
- Built control plane with FastAPI route modules
- Created Agent SDK with governance primitives
- Integrated Gemini with function calling and streaming
- Developed Streamlit UI with agent creation wizard
- Implemented auto-codegen for agent boilerplate
- Added multi-cloud deployment with Kubernetes manifest generation
- Built agent mesh discovery and A2A invocation
- Integrated runtime LLM configuration for multi-tenancy
Challenges we ran into
Challenge 1: Unified Multi-Provider LLM Client
Problem: Support Google AI Studio, Vertex AI, OpenAI, and Anthropic with one interface while respecting their different authentication, function calling formats, and error handling.
Solution:
- Built
LLMClientabstraction layer normalizing:- Authentication (API keys vs GCP ADC vs bearer tokens)
- Function calling schemas (Gemini format vs OpenAI format)
- Streaming responses with different chunking strategies
- Provider-specific error codes and retries
- Result: Same agent code works with any provider; switch via env vars
Challenge 2: Runtime LLM Configuration Without Rebuilding
Problem: Hard-coding API keys in agent definitions is insecure; baking them into Docker images breaks multi-tenancy (same agent, different customers).
Solution:
- Agent definitions specify model as
"auto"or model name (no credentials) - LLM config injected as Kubernetes env vars at deployment time:
yaml env: - name: GOOGLE_API_KEY valueFrom: secretKeyRef: name: gemini-api-key key: api-key - name: LLM_PROVIDER value: "vertex_ai" - Same Docker image can use AI Studio in dev, Vertex AI in prod, or OpenAI in testing
Challenge 3: Agent Code Generation from YAML
Problem: Manually writing 80+ lines of SDK boilerplate for each agent (load definition, init tools, wire LLM, handle conversation) is tedious and error-prone.
Solution:
- Template-based code generation using
agents/template/as base - Jinja2 templating with placeholder replacement:
{{agent_id}}→ actual agent ID{{allowed_tools}}→ tool import statements{{purpose}}→ agent purpose string
- Auto-generates:
agent.py,interactive.py,__init__.py,README.md - Triggered on agent creation or via
POST /api/v2/code-gen/generate - Developers can still customize generated code
Challenge 4: Policy Evaluation Latency
Problem: Calling control plane for Rego policy evaluation on every decision adds 50-100ms latency; unacceptable for high-frequency agents.
Solution:
- Async audit logging (don't block on writes)
- Policy result caching for idempotent decisions
- Future: OPA sidecar with policy bundles (pull vs push)
- Critical policies still evaluated in real-time; non-critical cached
Challenge 5: Agent-to-Agent Circular Dependencies
Problem: Agent A invokes Agent B, which invokes Agent C, which invokes Agent A → infinite loop.
Solution:
- Invocation policy in
config/agent_invocation.yaml:yaml cloud_reliability: can_invoke: - cloud_healing cloud_healing: can_invoke: [] # leaf agent, no further invocation - SDK enforces allowlist; blocks unauthorized A2A calls
- Max invocation depth limit (default: 5)
Challenge 6: Kubernetes Secret Management Across Clouds
Problem: GCP uses Secret Manager, Azure uses Key Vault, AWS uses Secrets Manager - different APIs, different auth.
Solution:
- Generate standard Kubernetes Secrets in manifests (lowest common denominator)
- Document external secret operator integrations per cloud:
- GCP: External Secrets Operator + Secret Manager
- Azure: CSI driver + Key Vault
- AWS: External Secrets Operator + Secrets Manager
- Deployment wizard injects secrets as env vars from Kubernetes Secrets
Challenge 7: Gemini Function Calling Schema Conversion
Problem: Our tool definitions use Python type hints; Gemini expects JSON Schema format with specific structure.
Solution:
- Built converter in
ToolGateway.get_tools_schema():python def to_gemini_schema(tool_func): return { "name": tool_func.__name__, "description": tool_func.__doc__, "parameters": { "type": "object", "properties": extract_params(tool_func), "required": get_required_params(tool_func) } } - Automatically converts all allowed tools to Gemini-compatible format
- Handles nested objects, arrays, enums
Accomplishments that we're proud of
🏆 Complete Governed Agent Platform: We built a production-ready system that doesn't compromise on governance, security, or developer experience. This isn't a demo - it's a real platform.
🎯 Gemini Integration Excellence: Seamless support for both Google AI Studio (dev/testing) and Vertex AI (production) with function calling, streaming, and conversation history.
🔒 Security-First Architecture: RBAC, policy enforcement, audit logging, and kill-switch aren't afterthoughts - they're built into the SDK that every agent must use.
⚡ Auto-Codegen Magic: Create an agent in the UI, get 300+ lines of production-ready Python code auto-generated with proper SDK integration, tool wiring, and interactive REPL.
☁️ True Multi-Cloud: Same agent code and Docker image runs on GKE, AKS, EKS, or local Docker with environment-specific LLM configuration.
🤝 Agent Mesh: Agents discover and invoke each other based on skills/capabilities, with invocation policies preventing chaos.
🎨 Developer Experience: Both UI for non-technical users AND SDK for developers; comprehensive docs; sensible defaults; minimal boilerplate.
📊 Real-World Agents: Not just toy examples - we have production-ready agents for:
- Cloud reliability and incident response
- Payment failure investigation
- Fraud detection
- Customer support
- Multi-cloud healing
🔧 Extensibility: Plugin architecture for tools, policies, and LLM providers; easy to add custom domains and capabilities.
What we learned
1. Gemini API Deep Dive
- Function calling is production-ready: Reliable, handles complex schemas, proper error handling
- Streaming matters: Real-time responses drastically improve UX for long-running analyses
- System instructions are powerful: Injecting agent purpose and skills creates strong persona
- Context window is generous: Can analyze full incident logs (10K+ tokens) in one shot
- Vertex AI vs AI Studio: Studio is perfect for dev; Vertex AI for prod with better SLA and quotas
- Model selection:
gemini-2.0-flash-expprovides best balance of speed/quality for agent tasks
2. Governance Cannot Be Optional
- You can't trust agents to "do the right thing" - governance must be SDK-level, not agent-level
- Kill-switch must be checked before every run, not just at startup
- Audit logging must be async; blocking on writes kills performance
- RBAC needs to be fine-grained: who can view vs use vs edit vs deploy
- Policy-as-code (Rego) is powerful but requires clear input/output contracts
3. Agent Orchestration at Scale
- Skills vs Tools separation is critical:
- Skills = what agent can do (incident_investigation)
- Tools = how it does it (get_incident, analyze_logs)
- Purpose = why it exists (24/7 reliability)
- Capability-based routing beats hard-coding: Instead of "invoke agent_123", do "find agent with skill=root_cause_analysis"
- Invocation policies prevent chaos: Allowlist who can invoke whom; max depth limits
- Mesh discovery needs structure: Filtering by domain, persona, skill, risk_tier
4. Multi-Cloud Deployment Realities
- Kubernetes is the universal abstraction; cloud-specific features are nice-to-have
- Runtime configuration injection is non-negotiable for multi-tenancy
- Same image, different env vars = different LLM backends, different customers
- Secrets management is still painful; External Secrets Operator helps
- Helm adds complexity; start with raw manifests, graduate to Helm
5. Developer Experience Multiplier
- Auto-codegen saves hours per agent; developers can still customize
- UI + SDK dual interface reaches more users
- Conversation history is mandatory for debugging agent behavior
- Good defaults matter:
model="auto", sensible tool timeouts, built-in retries - Documentation and examples are as important as the code
6. LLM Provider Abstraction Challenges
- Every provider has quirks: auth, function calling format, error codes, rate limits
- Gemini's native function calling is more reliable than OpenAI's (fewer formatting errors)
- Streaming implementations vary wildly; need robust chunk handling
- Provider-agnostic code is possible but requires careful abstraction design
What's next for RAVP: Regulated Agent Vending Platform
Near-Term (Next 3 Months)
1. Enhanced Skills Routing with Gemini
- Use Gemini to analyze user query and automatically select best agent based on skills
- Example: "My payment failed" → Gemini routes to
payment_failedagent - Natural language agent discovery instead of manual selection
2. Distributed Tracing
- OpenTelemetry integration for cross-agent request tracking
- Trace an incident from detection → diagnosis → healing → resolution
- Visualize agent-to-agent call graph with latency metrics
3. Cost and Performance Observability
- Dashboard showing per-agent Gemini API costs
- Token usage tracking and optimization recommendations
- Latency p50/p95/p99 for agent responses
- Cache hit rates for policy evaluation
4. Policy Marketplace
- Shareable Rego policies for common scenarios:
- PCI-DSS compliance for payment agents
- HIPAA compliance for healthcare agents
- SOC 2 audit trail requirements
- Community-contributed policies with ratings/reviews
Mid-Term (Next 6 Months)
5. Multi-LLM Orchestration
- Single agent dynamically chooses LLM per task:
- Gemini for fast incident triage
- GPT-4 for complex root cause analysis
- Claude for long-document summarization
- Cost-aware routing (use cheaper model when possible)
6. Gemini Multimodal Support
- Agents process images: architecture diagrams, error screenshots, dashboards
- Video analysis: screen recordings of incidents
- Structured data: CSV logs, JSON configs
7. Agentic Workflows
- Visual workflow builder (drag-drop agent connections)
- Conditional routing based on policy evaluation
- Parallel agent execution with result aggregation
- Human-in-the-loop approval gates
8. GitOps Integration
- Agent definitions stored in Git repos
- PR-based agent updates with approvals
- CI/CD pipeline for agent testing and deployment
- Rollback to previous agent versions
Long-Term (Next Year)
9. Federated Agent Mesh
- Cross-organization agent discovery (with permissions)
- Marketplace for agent templates
- Shared tool registry across companies
- Industry-specific agent packs (fintech, healthcare, retail)
10. Advanced Governance
- Real-time agent behavior monitoring with anomaly detection
- Automatic kill-switch triggers based on policy violations
- Compliance report generation (GDPR, SOC 2, ISO 27001)
- Fine-grained PII handling with data masking
11. Edge Agent Deployment
- Lightweight agents running on edge devices
- Local Gemini models for low-latency scenarios
- Sync with central control plane for updates
- On-device policy evaluation
12. Agent Learning and Improvement
- Track agent success/failure rates
- Use Gemini to analyze failed incidents and suggest tool improvements
- A/B testing for agent prompt variations
- Automatic retraining based on user feedback
Research Areas
13. Autonomous Agent Swarms
- Multiple agents collaborate on complex problems
- Self-organizing based on skills and workload
- Consensus mechanisms for conflicting recommendations
14. Explainable AI for Agents
- Gemini generates human-readable explanations for decisions
- Trace decision back to specific policy rules or data
- "Why did you choose this remediation?"
15. Agent Security Hardening
- Adversarial testing: red team agents trying to break policies
- Prompt injection detection and mitigation
- Sandboxing for untrusted tool execution
Conclusion
We believe RAVP represents the future of enterprise AI: governed, auditable, multi-cloud, and production-ready. With Gemini as the reasoning engine and comprehensive controls as the foundation, organizations can finally deploy AI agents at scale without sacrificing security or compliance.
Why Gemini?
We chose Gemini as our primary LLM because:
- Function Calling: Native, reliable, and well-documented
- Context Window: Long context for analyzing full incident logs
- Vertex AI Integration: Seamless GCP deployment for enterprises
- Multimodal Ready: Prepared for future image/video tool inputs
- Cost-Effective: Competitive pricing for production workloads
- Google Cloud Synergy: Natural fit for GCP-heavy organizations
Gemini isn't just an LLM provider for us - it's the reasoning engine that makes governed, production-grade AI agents possible.
Contact:
- Team: [RAVP team]
- Email: [visanthoxd@gmail.com]
Built With
- 3.11
- agent)
- api
- bash/shell
- docker
- fastapi
- gcr)
- gemini
- gke)
- google-genai)
- kaniko
- kubernetes
- logging
- monitoring
- open
- policy
- pydantic
- python
- rego
- streamlit
- uvicorn
- vertex
- yaml
Log in or sign up for Devpost to join the conversation.