Inspiration

Every team putting agents into production hits the same wall: it isn't the model that fails, it's the handoffs. In a real agentic process an AI agent, an RPA robot, and a human approver run in one flow — and testing today checks each one in isolation. The agent passes its eval, the robot passes its unit test, the human step is assumed correct. Then production breaks anyway, because the failure lives in the seam between two actors:

  • The agent emits structurally valid but semantically wrong output — right JSON, wrong number — and the robot faithfully executes it.
  • Output variability routes a case around a human approval that policy required.
  • A prompt or model change quietly doubles cost-per-run or blows the cycle-time SLO.

Concretely: in our reference case the agent reconciles an invoice to $5,400 when the line items sum to $4,200 — a $1,200 overpayment in valid JSON that posts to the ERP in under two seconds, no human in the loop. Isolated evals never see it. We built SeamProof to test exactly these seams and gate the release before they ship.

What it does

SeamProof is a release gate for agentic processes. It treats every handoff as a contract and tests it against the real run trace, asserting trace-level properties at each agent → robot → human boundary, then emits a go / no-go decision with the evidence.

  • Three seam contracts guard the bundled invoice-exception process: agent→robot data integrity (amount == Σ line_items), a routing→human checkpoint (a required approval must precede the post), and a cost / cycle-time SLO (advisory).
  • The gate blocks the release. Break a blocking seam and it returns a non-zero exit — it fails CI, and it posts a Failed result to UiPath Test Manager.
  • The Seam Analyst — an agent on the UiPath LLM Gateway — doesn't stop at no. For each failed seam it returns a root cause, a concrete fix, and a fragility rating. The tester is itself agentic: it finds the break and tells you how to close it.

It maps to all four of Track 3's asks: it validates the AI-infused workflow, recommends fixes, surfaces fragile seams, and treats contracts as executable requirements.

How we built it

  • System under test — a real UiPath coded automation: a recon agent (UiPath LLM Gateway, or an external LangChain agent), a router, an Action Center human approval, and a posting robot. Every step is wrapped in UiPath's @traced and the run is emitted as OpenTelemetry. It runs on the UiPath runtime via uipath run, and fully offline for development.
  • SeamProof engine — a small, data-only contract language and trace evaluator (no eval, no code execution) with six assertion kinds and a severity-aware gate. It ingests the OTEL trace and renders text / markdown / JSON / JUnit, plus the CI exit code.
  • UiPath integration, both ends — ingest Maestro/agent OpenTelemetry traces; publish the gate result to Test Manager via its v2 REST API (test cases → test set → execution → per-seam results → finish). We posted a real Finished execution to a live tenant, captured it back from the API, and committed it as evidence.
  • Built with a coding agent — the seam contracts, the Seam Analyst, the adversarial scenarios, and the reporter were authored with Claude Code through UiPath for Coding Agents.

Challenges we ran into

  • Test Manager has no public "post external results" doc. We reverse-engineered the real v2 REST API from the tenant's live Swagger and matched it exactly — including the lifecycle gotcha that a test-case log must be finished (not just result-set) before the execution leaves "Running" for a terminal Finished status.
  • Making the whole pipeline demonstrable offline while it lights up fully in the tenant — solved with graceful degradation everywhere (the LLM Gateway agents fall back to deterministic logic with no credentials), so the demo never depends on the cloud being up.

Accomplishments that we're proud of

  • A working, tested engine — 74 passing tests, CI green on every push — that gates real UiPath runs and catches all three seam failures.
  • A genuine UiPath coded automation with @traced, the LLM Gateway, Action Center, uipath eval (agent quality scores 1.0), and an external LangChain agent — one platform, many surfaces.
  • The gate's seams created as managed test cases in a real Test Manager project, with a Finished execution carrying the per-seam Passed/Failed results.
  • The Seam Analyst, which turns a red gate into an actionable root cause + fix, on the LLM Gateway.

What we learned

Testing agentic systems is a different discipline from testing agents. An agent eval tells you the model is good; it says nothing about whether the composite process is safe. The unit of risk is the handoff, and the right place to assert it is the run trace — which also decouples the tester from any one platform's internals. And a gate is far more useful when it doesn't just say no, but recommends the fix.

What's next for SeamProof

  • Auto-discover seams from a Maestro process export.
  • A web report UI for the gate.
  • More assertion kinds (statistical drift, schema evolution) and baseline-vs-candidate A/B diffing.
  • Broader systems under test (transaction-dispute intake, claims). The seam-contract model generalises — adopt it for your own process in three steps (see docs/adopt-seamproof.md).

Built With

  • action-center
  • agent-builder
  • langchain
  • maestro
  • opentelemetry
  • python
  • test-manager
  • uipath
  • uipath-llm-gateway
Share this project:

Updates