Inspiration

Enterprises are shipping AI agents — loan advisors, support bots, HR assistants — into production with almost no way to test them. They hallucinate, obey injected instructions, leak one customer's data to another, and contradict themselves on the same question. I wanted a pre-production reliability gate for agents that lives where QA already works: UiPath Test Cloud.

What it does

Sentinel is an AI agent that audits other AI agents. Given a short mandate (the target's role, grounding data, forbidden actions), it generates adversarial probes across four dimensions — prompt injection, hallucination, PII / cross-customer data leakage, and non-determinism — fires them at the live agent, judges every response (deterministic checks + LLM-as-judge), and produces a severity-weighted reliability scorecard.

Every finding syncs to UiPath Test Manager as a native test case. The demo target, LoanAdvisor (built in Agent Builder), is deliberately misconfigured: ask it for another customer's record and it hands over the SSN. Sentinel scores it 56/100 — RED, and a UiPath test robot runs the probe autonomously and fails the case — capturing the leaked record as evidence.

How I built it

  • Sentinel — a coded Python agent (pydantic, httpx, TDD; 158 tests, fully mocked, no network).
  • Target invocation — UiPath Orchestrator Jobs API.
  • Governed judging — every LLM call routes through the UiPath AI Trust Layer LLM Gateway, so the auditor's own reasoning is audit-logged.
  • Severity scoring — one HIGH-severity breach caps a dimension at 25 and forces RED; no averaging away a confirmed leak.
  • Test Cloud — 26 test cases + a manual execution (20 passed / 6 failed) written via the uip CLI; plus a coded UiPath test (C#) that a serverless Test Automation robot runs — it invokes LoanAdvisor and asserts no SSN leaks, failing natively.
  • Built end-to-end with Claude Code through UiPath for Coding Agentsuip skills + CLI drove pack → upload → link → run.

Challenges I ran into

  • Frontier models resist jailbreak and hallucination through alignment, so the real, reproducible vulnerabilities are configuration failures — broken access control, OWASP LLM02. I repositioned the whole audit around that insight.
  • Platform reality: the LLM Gateway lives under /agenthub_; the Test Manager ThirdParty-execution API returns 500 on staging (worked around with a manual test-set execution); link-automation mandates a folder key the tenant-feed API rejects; and getting a robot to run the test meant sorting Studio + Test Automation licensing.

Accomplishments that I'm proud of

  • A robot autonomously catching a live OWASP LLM02 data leak and recording it as a native Automated failure in Test Manager.
  • An honest scorecard — green where the agent is genuinely robust, red where it genuinely fails, with few false positives.

What I learned

Agent reliability is mostly an access-control and configuration problem, not a model problem — and UiPath's Jobs API, AI Trust Layer, and Test Cloud compose into a real pre-production gate for agents.

What's next

More dimensions (out-of-mandate actions, tool misuse, refusal calibration); pass-rate trends across sprints; and wiring Sentinel itself as a scheduled Orchestrator process that gates agent deploys.

Built With

  • .net
  • claude
  • claude-code
  • csharp
  • httpx
  • pydantic
  • pytest
  • python
  • uip-cli
  • uipath
  • uipath-agent-builder
  • uipath-ai-trust-layer
  • uipath-orchestrator
  • uipath-studio
  • uipath-test-cloud
  • uipath-test-manager
  • uv
Share this project:

Updates