Inspiration
Reliability concerns for LLM agents are usually bundled into one heavy framework. Production teams building healthcare agents do not want to buy into a full framework just to get an egress allowlist or a tool-argument validator. They want à la carte primitives they can drop into existing code without adopting a new programming model.
Healthcare amplifies this. A FHIR-querying agent has to be defensive about: only calling sanctioned endpoints, never leaking PHI in logs, abstaining when the right tool is not on the list, validating that a patient_id looks like a real FHIR id before it hits the tool, never exceeding a token or dollar budget on a single patient query, and producing structured outputs the downstream system can actually parse. Existing agent frameworks address one or two of these. None of them give you all six in libraries you can adopt independently.
What it does
agent-stack is six independently published libraries, each handling exactly one reliability concern. Every library ships in three forms: TypeScript on npm, Python on PyPI, and an MCP-server variant that exposes the primitive as a tool to any MCP client (Claude Desktop, Cursor, Continue, etc.).
- AgentFit — token-aware context-window fitting with multiple truncation strategies. Pluggable tokenizers for OpenAI / Anthropic / open models.
- AgentGuard — declarative network egress allowlist of domains agent tools can fetch. Throws on violation. Blocks the "agent suddenly POSTs PHI to attacker.com" failure mode.
- AgentSnap — snapshot tests for tool-call traces. Catches regressions where a model's tool-call shape silently changes between deploys.
- AgentVet — wraps tool functions with argument validation. Throws a
ToolArgErrorcarrying an LLM-friendly retry hint, so the next turn can self-correct. - AgentCast — structured-output validate-and-retry loop for LLM JSON. Bring your own LLM and validator.
- AgentBudget — per-run token + dollar caps with a hook for early termination.
For Agents Assemble we built a thin healthcare-agent demo on top: a FHIR query agent wrapped with all six primitives. The demo shows how a 30-line program gets PHI-redacted logs, a strict allowlist of FHIR endpoints, JSON outputs the downstream system can parse, and a budget that prevents runaway loops on a single patient.
How we built it
- TypeScript for the canonical implementation. Hand-maintained type declarations, zero runtime dependencies. Each library is under 500 LOC.
- Python port matches the surface 1:1, so a Python team and a TypeScript team can interop on the same primitives.
- MCP server variants built on the official
@modelcontextprotocol/sdk. Each library exposes its primitive as a tool callable from any MCP client. AgentVet-as-MCP, for instance, lets a remote LLM askagentvet.validate(tool, args)before executing. - Healthcare demo is a minimal Express + TypeScript service that calls a public FHIR sandbox (HAPI FHIR R4) and threads all six primitives around the tool calls.
- Reproducibility: every library has CI on GitHub Actions, snapshot tests, a CITATION.cff, and is archived in Software Heritage.
Challenges we ran into
- Cross-runtime parity. Keeping the TypeScript and Python surfaces 1:1 meant deferring some idiomatic choices on each side. We landed on a "boring core + idiomatic adapters" split.
- MCP server packaging. MCP servers are typically stateful processes, but our primitives are stateless functions. We had to design a "primitive-as-tool" wrapper that does not pretend to be a stateful server.
- PHI redaction in
AgentGuardlogs. Defaultconsole.erroron a violation could itself leak PHI. We added a hook so the consuming app supplies a redactor. - Budget bookkeeping under streamed token usage. Anthropic and OpenAI report streamed token counts differently.
AgentBudgethad to normalize.
Accomplishments that we're proud of
- Six libraries, all published, all zero-dependency, all under 500 LOC each.
- Three runtime surfaces (npm + PyPI + MCP) for every primitive.
- A working healthcare-FHIR demo with all six primitives wired together.
- A peer-reviewable artifact paper accepted track at ASE 2026 Tools (under review).
- DataCite DOI minted on Zenodo: 10.5281/zenodo.20074702.
- Software Heritage archival of all source repos.
What we learned
- "Composable by inclusion" beats "composable by framework." Teams adopt one library at a time.
- Cross-runtime parity forces good API design. If a primitive is hard to port, the API was probably wrong.
- Reliability primitives are best when they fail loudly with retry-friendly errors, not when they silently swallow.
What's next for agent-stack
- Healthcare-aware AgentGuard preset: a curated allowlist of FHIR endpoints + PHI-field redaction policies.
- AgentTrace as a seventh primitive for cost + latency telemetry.
- A combined
@mukundakatta/agent-stackmeta-package that imports all six with sensible defaults for healthcare agents.
Built With
- anthropic
- fhir
- healthcare
- mcp
- npm
- openai
- pypi
- python
- typescript

Log in or sign up for Devpost to join the conversation.