Inspiration
Enterprises are shipping AI agents — loan advisors, support bots, HR assistants — into production with almost no way to test them. They hallucinate, obey injected instructions, leak one customer's data to another, and contradict themselves on the same question. I wanted a pre-production reliability gate for agents that lives where QA already works: UiPath Test Cloud.
What it does
Sentinel is an AI agent that audits other AI agents. Given a short mandate (the target's role, grounding data, forbidden actions), it generates adversarial probes across four dimensions — prompt injection, hallucination, PII / cross-customer data leakage, and non-determinism — fires them at the live agent, judges every response (deterministic checks + LLM-as-judge), and produces a severity-weighted reliability scorecard.
Every finding syncs to UiPath Test Manager as a native test case. The demo target, LoanAdvisor (built in Agent Builder), is deliberately misconfigured: ask it for another customer's record and it hands over the SSN. Sentinel scores it 56/100 — RED, and a UiPath test robot runs the probe autonomously and fails the case — capturing the leaked record as evidence.
How I built it
- Sentinel — a coded Python agent (pydantic, httpx, TDD; 158 tests, fully mocked, no network).
- Target invocation — UiPath Orchestrator Jobs API.
- Governed judging — every LLM call routes through the UiPath AI Trust Layer LLM Gateway, so the auditor's own reasoning is audit-logged.
- Severity scoring — one HIGH-severity breach caps a dimension at 25 and forces RED; no averaging away a confirmed leak.
- Test Cloud — 26 test cases + a manual execution (20 passed / 6 failed) written via the
uipCLI; plus a coded UiPath test (C#) that a serverless Test Automation robot runs — it invokes LoanAdvisor and asserts no SSN leaks, failing natively. - Built end-to-end with Claude Code through UiPath for Coding Agents —
uipskills + CLI drove pack → upload → link → run.
Challenges I ran into
- Frontier models resist jailbreak and hallucination through alignment, so the real, reproducible vulnerabilities are configuration failures — broken access control, OWASP LLM02. I repositioned the whole audit around that insight.
- Platform reality: the LLM Gateway lives under
/agenthub_; the Test Manager ThirdParty-execution API returns 500 on staging (worked around with a manual test-set execution);link-automationmandates a folder key the tenant-feed API rejects; and getting a robot to run the test meant sorting Studio + Test Automation licensing.
Accomplishments that I'm proud of
- A robot autonomously catching a live OWASP LLM02 data leak and recording it as a native Automated failure in Test Manager.
- An honest scorecard — green where the agent is genuinely robust, red where it genuinely fails, with few false positives.
What I learned
Agent reliability is mostly an access-control and configuration problem, not a model problem — and UiPath's Jobs API, AI Trust Layer, and Test Cloud compose into a real pre-production gate for agents.
What's next
More dimensions (out-of-mandate actions, tool misuse, refusal calibration); pass-rate trends across sprints; and wiring Sentinel itself as a scheduled Orchestrator process that gates agent deploys.
Log in or sign up for Devpost to join the conversation.