About the project — AutoRescue: Agentic Post‑Purchase Rescue
One‑liner: When a shipment is delayed or a payment fails, an autonomous agent detects it in real time, contacts the customer by SMS/voice with compliant options (reship, credit, refund), then executes the decision end‑to‑end (update order, label, credit/refund, ticket notes) — no human in the loop.
🔦 What inspired us
- WISMO overload (“Where Is My Order?”) and delay‑related tickets dominate CX queues. We wanted an agent that acts before customers reach out.
- Teams love LLMs for answers, but business value lives in actions: issuing credits, reshipping, editing orders, and logging outcomes safely.
- We believed that combining deterministic workflows with LLM policy reasoning could create a dependable, production‑scented agent that judges would see as real‑world ready.
🛠 How we built it
- Event ingestion
- Carrier webhook posts
status=delayed(sandbox tracking). - A tiny webhook service validates and forwards the event.
Orchestration (deterministic core)
DetectDelay → FetchContext → Decide → Outreach → ApplyAction → Confirm → Log.Policy‑aware decisioning
- The agent returns a typed tool call:
create_reshipment | create_coupon | create_refund | create_exchangewith parameters and a policy proof (thresholds, eligibility).
- Customer outreach (choice UX)
- Optional voice IVR: live transcription → intent → same action path.
- Side‑effects (the value)
- Shopify: new fulfillment for reship, exchange, or partial refund.
- Stripe: create coupon/credit or issue a refund.
- Shipping API: generate/void labels when needed.
- Ticketing (optional): log transcript + resolution in Gorgias/Zendesk.
- Observability & safety
- Temporal run history, idempotency keys, retries, and compensations.
Architecture at a glance
📚 What we learned
- Agents must prove compliance, not just “sound smart.” Typed tool calls + policy proofs (e.g., “max credit 20%”) earn trust.
- Deterministic + probabilistic is the winning combo. Use workflows for state, retries, and idempotency; let the LLM choose which action under guardrails.
- Observability is a feature. Token/cost caps, audit logs, and replayable traces make demos calmer and production closer.
- Customer choice boosts acceptance. Offering 2–3 options (reship/credit/refund) increased action completion versus a single “we decided for you” path.
- Small prompts, strong schemas. We got better stability with compact policies + strict JSON schemas than with long, narrative instructions.
- Real‑time ≠ real‑nice by default. Webhooks, retries, and idempotent updates matter even in a hackathon — or you double‑issue refunds.
⚠️ Challenges we faced
- Bridging LLMs to safe actions. Early outputs were chatty; enforcing a single tool call with a JSON schema and rejecting anything else fixed it.
- Policy edge cases. Returnless refunds on low‑AOV items vs. high‑risk SKUs required explicit rules and deny‑lists.
- Async race conditions. Customer replies could arrive while reshipment was processing; we added a “decision lock” per incident.
- Integration friction. Mapping carrier events to a single order (multi‑package) and normalizing addresses took longer than expected.
- Voice timing. IVR barge‑in and transcription delays needed tighter timeouts and short, confirmatory prompts.
- Demo reliability. We built a simulate‑delay endpoint and a minimal “run timeline” UI to survive Wi‑Fi jitters.
🗺️ API surface map
airia-web-apis.json mirrors how AutoRescue runs in production. These are the slices we rely on most:
Build and publish rescue agents
GET /v1/AgentCardandPOST /v1/AgentCardmanage reusable policy-backed agent definitions.POST /v1/AgentTriggerstores delay and payout rules that emit incidents into orchestration.POST /v1/Deploymentsships versioned runbooks, while/v1/Deployments/ApiKey/{agentId}issues scoped keys for downstream systems.
Run the incident loop
POST /v1/Webhook/{tenantId}/{webhookId}ingests carrier and payment events without a custom gateway.POST /v1/JobOrchestrationqueues long-running rescues;/v1/JobOrchestration/{id}/retryand/v1/JobOrchestration/{id}/resumehandle repair flows./v2/PipelineExecution/{pipelineId}executes the typed workflow with SSE controls from/v2/PipelineExecution/ResumeStream/{executionId}and/v2/PipelineExecution/StopStream.
Connect data and tools
POST /v1/CloudConnectorsand/v1/CloudConnectors/{id}/testregister Shopify, Stripe, and carrier credentials with heartbeat checks.POST /v1/Store/UploadFileand/v1/Store/{storeId}/graph/cypherload and query order knowledge inside the agent sandbox./v1/Tools/testConnectionand/v1/DataVectorSearch/search/{dataStoreId}verify the toolchain and surface the right memories for each incident.
Evaluate and guard decisions
GET /v1/AgentEvaluation/Resultsplus/v1/AgentEvaluation/AggregatedResults/{evaluationJobId}provide pass/fail telemetry across policy regressions.POST /v1/AgentEvaluationDataset/validateenforces schema integrity before a run hits production./v1/SmartScanand/v1/RedTeamingEvaluation/{id}/vulnerabilitiesharden prompts, while/v1/PipelineExecutionMetrics/model/usagetracks token and model spend.
Customer touchpoints and feedback
POST /v1/ChatSpaces/CreateSpacecreates the SMS/DM thread that Twilio and internal chat widgets reuse.POST /v1/VoiceChat/sessionsandPOST /v1/TextToSpeechhandle IVR sessions and confirmations.POST /v1/AgentFeedbackcaptures outcome ratings that loop back into evaluations.
Governance and integrations
/v2/OAuth/initiatebootstraps partner connections, and/v1/TenantPermissionsplus/v1/Roles/{id}/policieskeep access scoped.GET /v1/AuditLog/entriesand/v1/Alert/HasUnreadfeed the operations console.- Marketplace endpoints such as
/marketplace/v1/Library/agentsand/marketplace/v1/Library/toolsseed AutoRescue with curated playbooks and connectors.
Closing thought
AutoRescue showed us that the shortest path from LLM to business value is paved with policies, typed actions, and deterministic workflows. The result feels like software you could ship — and that’s exactly what we aimed to demonstrate.
Built With
- airia
- apify

Log in or sign up for Devpost to join the conversation.