Sentinel — an AI agent that tests other AI agents

Inspiration

Enterprises are shipping AI agents — loan advisors, support bots, HR assistants — into production with almost no way to test them. They hallucinate, obey injected instructions, leak one customer's data to another, and contradict themselves on the same question. I wanted a pre-production reliability gate for agents that lives where QA already works: UiPath Test Cloud.

What it does

Sentinel is an AI agent that audits other AI agents. Given a short mandate (the target's role, grounding data, forbidden actions), it generates adversarial probes across four dimensions — prompt injection, hallucination, PII / cross-customer data leakage, and non-determinism — fires them at the live agent, judges every response (deterministic checks + LLM-as-judge), and produces a severity-weighted reliability scorecard.

Every finding syncs to UiPath Test Manager as a native test case. The demo target, LoanAdvisor (built in Agent Builder), is deliberately misconfigured: ask it for another customer's record and it hands over the SSN. Sentinel scores it 56/100 — RED, and a UiPath test robot runs the probe autonomously and fails the case — capturing the leaked record as evidence.

How I built it

Sentinel — a coded Python agent (pydantic, httpx, TDD; 158 tests, fully mocked, no network).
Target invocation — UiPath Orchestrator Jobs API.
Governed judging — every LLM call routes through the UiPath AI Trust Layer LLM Gateway, so the auditor's own reasoning is audit-logged.
Severity scoring — one HIGH-severity breach caps a dimension at 25 and forces RED; no averaging away a confirmed leak.
Test Cloud — 26 test cases + a manual execution (20 passed / 6 failed) written via the uip CLI; plus a coded UiPath test (C#) that a serverless Test Automation robot runs — it invokes LoanAdvisor and asserts no SSN leaks, failing natively.
Built end-to-end with Claude Code through UiPath for Coding Agents — uip skills + CLI drove pack → upload → link → run.

Challenges I ran into

Frontier models resist jailbreak and hallucination through alignment, so the real, reproducible vulnerabilities are configuration failures — broken access control, OWASP LLM02. I repositioned the whole audit around that insight.
Platform reality: the LLM Gateway lives under /agenthub_; the Test Manager ThirdParty-execution API returns 500 on staging (worked around with a manual test-set execution); link-automation mandates a folder key the tenant-feed API rejects; and getting a robot to run the test meant sorting Studio + Test Automation licensing.

Accomplishments that I'm proud of

A robot autonomously catching a live OWASP LLM02 data leak and recording it as a native Automated failure in Test Manager.
An honest scorecard — green where the agent is genuinely robust, red where it genuinely fails, with few false positives.

What I learned

Agent reliability is mostly an access-control and configuration problem, not a model problem — and UiPath's Jobs API, AI Trust Layer, and Test Cloud compose into a real pre-production gate for agents.

What's next

More dimensions (out-of-mandate actions, tool misuse, refusal calibration); pass-rate trends across sprints; and wiring Sentinel itself as a scheduled Orchestrator process that gates agent deploys.

Built With

.net
claude
claude-code
csharp
httpx
pydantic
pytest
python
uip-cli
uipath
uipath-agent-builder
uipath-ai-trust-layer
uipath-orchestrator
uipath-studio
uipath-test-cloud
uipath-test-manager
uv

Updates

Kateryna Ivashchenko started this project — Jun 19, 2026 04:33 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.