agent-stack: six MCP-agent reliability primitives

Six libraries (TS + Python + MCP) covering agent reliability concerns: fit, guard, snap, vet, cast, budget.

Inspiration

Reliability concerns for LLM agents are usually bundled into one heavy framework. Production teams building healthcare agents do not want to buy into a full framework just to get an egress allowlist or a tool-argument validator. They want à la carte primitives they can drop into existing code without adopting a new programming model.

Healthcare amplifies this. A FHIR-querying agent has to be defensive about: only calling sanctioned endpoints, never leaking PHI in logs, abstaining when the right tool is not on the list, validating that a patient_id looks like a real FHIR id before it hits the tool, never exceeding a token or dollar budget on a single patient query, and producing structured outputs the downstream system can actually parse. Existing agent frameworks address one or two of these. None give you all six in libraries you can adopt independently.

What it does

agent-stack is six independently published libraries, each handling exactly one reliability concern. Every library ships in three forms: TypeScript on npm, Python on PyPI, and an MCP-server variant that exposes the primitive as a tool to any MCP client.

AgentFit — token-aware context-window fitting with multiple truncation strategies.
AgentGuard — declarative network egress allowlist. Throws on violation.
AgentSnap — snapshot tests for tool-call traces.
AgentVet — wraps tool functions with argument validation; throws ToolArgError with an LLM-friendly retry hint.
AgentCast — structured-output validate-and-retry loop for LLM JSON.
AgentBudget — per-run token + dollar caps with a hook for early termination.

How we built it

TypeScript for the canonical implementation, Python port matching the surface 1:1, MCP server variants on @modelcontextprotocol/sdk. Healthcare demo: a minimal Express + TypeScript service that calls a public FHIR sandbox (HAPI FHIR R4) and threads all six primitives around the tool calls. Reproducibility: CI on GitHub Actions, snapshot tests, CITATION.cff, Software Heritage archival, DataCite DOI on Zenodo.

Challenges we ran into

Cross-runtime parity. Keeping the TypeScript and Python surfaces 1:1 meant deferring some idiomatic choices on each side.
MCP server packaging. Primitives are stateless functions, MCP servers are typically stateful. We designed a "primitive-as-tool" wrapper.
PHI redaction in AgentGuard logs. Added a hook so the consuming app supplies a redactor.
Budget bookkeeping under streamed token usage. AgentBudget normalizes across providers.

Accomplishments

Six libraries, all published, all zero-dependency, all under 500 LOC. Three runtime surfaces (npm + PyPI + MCP). Working healthcare-FHIR demo. Zenodo DOI: 10.5281/zenodo.20074702.

What's next

Healthcare-aware AgentGuard preset
AgentTrace as a seventh primitive
A combined @mukundakatta/agent-stack meta-package

Try it

Repo: github.com/MukundaKatta/agent-stack Zenodo DOI: https://doi.org/10.5281/zenodo.20074702 License: Apache 2.0

Built With

adk
ai-agents
cloud-run
gemini
mcp
n8n
python
rpa
streamlit
uipath
vertex-ai
workflow-automation

Updates

Mukunda Katta started this project — May 18, 2026 03:04 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.