Inspiration

When a production LLM provider degrades, teams scramble to switch models manually — no evidence trail, no approval gate, no rollback plan. We wanted a workflow that makes model routing a governed, auditable process instead of an ad-hoc decision.

What it does

Otter is a 10-node BPMN pipeline built with LangGraph and deployed on UiPath Maestro. Given an incident trigger (reactive degradation or proactive drift detection), it runs:

  1. Parallel evidence gathering — metrics agent and vendor status agent fan out simultaneously
  2. Conditional evaluation — eval and drift detection only fire when the trigger is proactive
  3. Diagnosis — Gemini-powered root cause analysis with severity classification
  4. Routing proposal — candidate model ranking with quality/cost tradeoffs
  5. Policy gate — deterministic DMN 1.3 business rule: auto-approve if confidence > 0.6 and severity != CRITICAL, else require human approval via Action Center
  6. Canary monitor — kill-switch that rolls back the route if metrics regress during the guard window

Every node exchanges typed Pydantic v2 contracts with extra='forbid' — the graph rejects malformed payloads at runtime. The key architectural decision: LLM agents propose routes, but deterministic services (PolicyGate, CircuitBreaker, BaselineProfile) make the final allow/deny. Governance decisions are reproducible, version-pinned, and auditable.

Business Impact

Who needs this: Any organization running LLM-powered agents in production — from startups with a single GPT integration to enterprises managing dozens of model endpoints across teams.

The cost of not having it: A single undetected model regression can mean hours of degraded user experience before someone notices, files a ticket, and an engineer manually investigates. For customer-facing AI (support bots, content generation, coding assistants), that's direct revenue and trust loss.

How Otter saves money:

  • Reactive: Cuts mean-time-to-recovery from hours (manual detection + manual failover) to seconds (automatic detection + governed re-route)
  • Proactive: Catches silent quality drift before users complain — the incident that never happened is the cheapest one
  • Audit: Compliance teams get a typed, versioned decision trail instead of Slack threads and post-mortem guesswork

Go-to-market: Open-source core (Apache 2.0) + managed SaaS for teams that want hosted monitoring, dashboard, and alert integrations. Per-model-monitored pricing aligned with usage.

How we built it

  • LangGraph for the BPMN topology — parallel gateways, conditional branches, call activities
  • Gemini Flash via Google AI Studio API for diagnosis agent reasoning with structured output
  • UiPath Maestro as the deployment runtime — packed as .nupkg, invoked via Orchestrator
  • 11 Pydantic v2 schemas (schemas.py) defining every inter-node contract
  • Fixture mode for deterministic testing — every model-calling node has a fixture branch that bypasses LLM calls
  • Claude Code (Claude Agent SDK) as the primary coding agent, with Codex for QA review

Challenges we ran into

  • UiPath SDK schema alignmentuipath.json agent registration format changed between SDK versions; bindings.json resources array had to match exactly or pack would silently fail
  • Cloud runtime doesn't Pydantic-marshaltrigger_intake received raw dicts from UiPath Cloud instead of Pydantic objects; added explicit coercion
  • Version caching — UiPath Cloud caches schema by version number; same-version republish doesn't re-parse, requiring version bumps to escape cache

What we learned

  • BPMN topology enforces governance better than prompts — the graph structure itself prevents shortcuts
  • Typed contracts between nodes catch integration bugs at deploy time, not at incident time
  • Fixture mode isn't just for testing — it's the reliable demo path when LLM providers are flaky
  • The separation of "LLM proposes" vs "deterministic rules decide" is what makes the system auditable — conflating the two would undermine the governance story

What's next

  • Wire remaining stub nodes to live Gemini calls (diagnosis is live, 6 others use fixtures with typed contracts ready)
  • Implement canary threshold logic with real metric comparison
  • Add notification agent for Slack/PagerDuty integration
  • Multi-tenant support with per-tenant policy rules
  • Hook into UiPath Action Center for human-in-the-loop approval on CRITICAL severity incidents

Built With

  • claude-code
  • codex
  • dmn
  • gemini
  • google-ai-studio
  • langchain
  • langgraph
  • pydantic
  • python
  • uipath-action-center
  • uipath-agent-builder
  • uipath-maestro
Share this project:

Updates

posted an update

Update — Video & Gallery Refresh

Rebranded demo video and Devpost gallery with UiPath visual identity:

  • Video: Updated color palette (teal + orange-red accents), added UiPath logo to title and closing scenes, white logo for dark background
  • Gallery: Replaced screenshots with 3 custom-designed infographics — project overview, technical architecture, and governance model
  • YouTube: New upload with branded thumbnail

No code changes — visual polish only.

Log in or sign up for Devpost to join the conversation.

posted an update

Submission Update — README & Deck Refresh

Thanks for the feedback! We've updated the submission:

README.md — Expanded with:

  • Detailed project description (problem + solution + differentiation)
  • Complete list of UiPath components used (Maestro BPMN, Agent Builder, Action Center, Business Rule Task, Storage Buckets, Queues, Context Grounding, and more)
  • Agent type: Coded Agents (LangGraph + uipath-python SDK)
  • Step-by-step setup instructions (from clone to UiPath Cloud deployment)

Presentation Deck — Migrated to the official UiPath AgentHack template with all required sections.

No changes to the core architecture or codebase — just better documentation for the judges.

Log in or sign up for Devpost to join the conversation.