Inspiration

The hackathon framing — "the last mile" between AI capability and clinical deliverables — landed for a reason. Every healthcare AI demo I'd seen looked impressive in a vacuum and fell apart on contact with real workflows. The friction wasn't model quality; it was that the same FHIR query gets re-implemented in every project, the same SMART-on-FHIR token plumbing gets re-invented, and the same retry/timeout/concurrency logic gets re-glued for every new tool.

That's a composition problem, not an AI problem. And it happens to be exactly the problem weft — a small category-theoretic algebra for Go I'd been working on separately — was designed to solve. weft says: every step in an LLM/MCP/agent workflow is an Arrow[A, B], and the same combinators (Compose, Pipe3, Par, Traverse, Apply) operate uniformly on every arrow regardless of how it was constructed. Write the FHIR read once. Write the scoring rule once. Compose them into as many tools as your workflow needs.

Suture is what happens when you apply that idea to the Prompt Opinion challenge: a working demonstration that healthcare AI tools become dramatically easier to build, share, and harden when you stop writing one-off pipelines and start composing typed arrows. The submission is live, deployed to Fly.io at https://suture.fly.dev, registered as an MCP server inside a real Prompt Opinion workspace, with the integration verified by clinical conversations invoking the tools end-to-end.

What it does

Suture exposes five healthcare AI tools through a single MCP server, registered for invocation from the Prompt Opinion Marketplace:

Tool What it does Composition shape
get_patient_summary Returns demographics + active problem list for the patient in FHIR context Par + Map — two FHIR reads in parallel
calculate_cha2ds2_vasc Computes the standard stroke-risk score for atrial fibrillation patients Pipe3 — FHIR reads → component extraction → scoring rules
get_cha2ds2_vasc_components Returns the per-criterion breakdown without summing Compose — reuses the upstream arrows from the score tool
summarize_recent_encounters Pulls recent encounters in bounded-concurrent parallel and returns a timeline Traverse(WithConcurrency(4), OnError(PartialResults))
prior_auth_assistant Multi-step LLM agent that orchestrates the four superpowers into a prior authorization letter Loop over the other arrows as tool bindings

The first four are Superpowers that Prompt Opinion's General Chat Agent and template agents can invoke directly. The fifth is a full multi-step agent loop that runs its own LLM orchestration inside a single MCP call — an agent within an agent. All five share the same building blocks. There is exactly one place in the codebase that knows how to read FHIR, one place that knows the CHA₂DS₂-VASc rules, one place that talks to Claude.

The integration follows Prompt Opinion's published FHIR context spec exactly. The initialize response declares capabilities.extensions["ai.promptopinion/fhir-context"] with four SMART scopes (patient/Patient.rs, patient/Condition.rs, patient/Encounter.rs, patient/Observation.rs). On every tools/call, the platform attaches HTTP headers — X-FHIR-Server-URL, X-FHIR-Access-Token, X-Patient-ID — that Suture extracts in one middleware function and propagates through every weft arrow via Go's context.Context. The LLM never sees the auth data; the arrows never thread it through parameters. One file owns the entire Prompt Opinion-specific contract, and the rest of the codebase is platform-agnostic.

Alongside the MCP server, the repo ships a local operator console (cmd/console/) — a Go binary that embeds a React UI via go:embed and serves a real-time visualization of what happens inside Suture for any tool invocation. The console fires a real HTTP POST against the deployed server, then renders a waterfall timeline showing the parallel weft.Par FHIR reads with overlapping bars, followed by the typed result as a clinical-style patient card. It's how the architectural thesis becomes visible: not "trust me that the reads are parallel," but "watch them overlap in time."

How we built it

The architecture stacks in five layers:

  1. HTTP transport — A minimal, dependency-free MCP server (internal/mcp, ~350 LOC) speaking JSON-RPC 2.0 over HTTP, plus a Streamable HTTP variant for stdio when debugging locally. We hand-rolled this rather than depend on mark3labs/mcp-go because the latest version requires Go 1.25 and our build environment was constrained to Go 1.22 — and the resulting code is small enough to audit in two files, with zero transitive dependencies. The handler interface is shape-compatible with mcp-go so swapping is a one-package change. The server declares Prompt Opinion's FHIR context capability extension during initialize, so the platform recognizes it as a SMART-aware MCP server during registration.

  2. FHIR context middleware — One file, internal/fhircontext/fhircontext.go, owns the entire Prompt Opinion contract: extract context fields from HTTP headers, validate that a patient is present, inject a typed Context value into context.Context. If the spec changes, this file changes. Nothing else does. The token is optional per spec — some FHIR sandboxes don't require authorization — and Suture handles that case correctly without sending an Authorization header.

  3. FHIR clientinternal/fhir/fhir.go provides typed FHIR R4 reads (ReadPatient, SearchConditions, SearchObservations) returned as weft.Arrow values. They pull FHIR context out of ctx, so they compose with every other arrow without parameter threading.

  4. Agent looppkg/agent/loop.go is ~100 LOC. It runs the standard LLM-call → tool-dispatch → result-feedback loop, but the loop itself is just an Arrow[llm.Prompt, llm.Response]. That means it composes with everything else in the algebra. The prior_auth_assistant tool uses this loop with bindings to the other four Superpowers — when invoked, it runs its own LLM orchestration (via Claude through weft/llm) inside a single MCP call.

  5. Toolspkg/tools/ is where the value lives. Each tool is a composition of FHIR arrows, pure scoring/parsing arrows, and (for prior_auth_assistant) the agent loop. The CHA₂DS₂-VASc package is the headline example: three building-block arrows (fetchClinicalData, extractComponents, computeScore) get composed three different ways to produce two MCP tools plus the agent's binding.

Deployment is a multi-stage Docker build (~15MB final image, distroless, nonroot) deployed to Fly.io in the IAD region. Single binary, single config file, single Fly app. The repo includes a fly.toml and Dockerfile so deployment is fly launch --copy-config --no-deploy && fly secrets set ANTHROPIC_API_KEY=... && fly deploy. A health check at /healthz keeps the machine warm.

Challenges we ran into

Documentation archaeology to find the real spec. Early in the project, the codebase modeled the platform contract using a placeholder "SHARP" naming convention from the hackathon's marketing copy, with context-propagation through MCP _meta fields. Late in the project, after locating the actual published spec at docs.promptopinion.ai/fhir-context/mcp-fhir-context, we discovered the real wire format uses HTTP headers — X-FHIR-Server-URL, X-FHIR-Access-Token, X-Patient-ID — and requires declaring a capabilities.extensions["ai.promptopinion/fhir-context"] block during initialize to opt into the FHIR context flow. We renamed internal/sharp to internal/fhircontext, added the HTTP transport, declared the capability extension, and shipped a clean Path A patch. The fact that one middleware file owned every assumption made this a localized refactor rather than a rewrite.

Toolchain constraints. The latest mark3labs/mcp-go requires Go 1.25, and the build environment was constrained to Go 1.22. Rather than fight the toolchain, we wrote a minimal MCP server ourselves — which turned into a feature: zero external transport dependencies, two auditable files, and the project becomes trivial to deploy in restricted environments. Constraint-driven design that often produces a cleaner result than the path you originally wanted.

API-shape archaeology on a fast-moving library. The published weft v0.1.0 doesn't expose llm.Loop even though the README on main describes it. We could have pulled from main, but pinning to a published tag is the right call for reproducibility. So we implemented the loop ourselves — and that turned into the strongest part of the project. Having the agent loop be ~100 LOC of weft composition rather than an opaque framework dependency makes the architectural thesis legible.

CORS bug that curl couldn't see. The initial HTTP transport returned Access-Control-Allow-Origin: * only on OPTIONS preflight responses, not on the actual POST response. Server-to-server callers (curl, the Go test client, Prompt Opinion) didn't care. But browsers blocked the response body, breaking the operator console. The race detector caught a related issue in test helpers (shared bytes.Buffer across goroutines). Both fixed, both lessons in "if you have a thing called CORS, test it from a browser, not curl."

Race conditions in tests, not in code. The MCP server is goroutine-safe (mutex around the writer, atomic registration). But our initial test helpers shared a bytes.Buffer across the test goroutine and the server's response goroutine — clean code, race-y tests. The race detector caught it, we wrote a safeBuffer wrapper, the suite is now -race clean across all 7 packages. A good reminder that test code is production code.

Accomplishments that we're proud of

The composition story actually pays off. Three of the five tools share fetchClinicalData, and two share extractComponents. When we added the prior_auth_assistant agent, the four tool bindings were single-line wrappers — agent.BindArrow(spec, PatientSummaryArrow(c)) — because the arrows already existed. Adding a sixth tool tomorrow means writing one new file, not refactoring four others.

A working production integration. Suture is live at https://suture.fly.dev/mcp, registered as an MCP server in a Prompt Opinion workspace. When a clinician sends a message to PO's General Chat Agent like "Use the get_patient_summary tool to tell me about this patient", the platform sends an HTTPS POST to our Fly machine with the FHIR context headers, our weft pipeline reads the patient and conditions in parallel from PO's FHIR server, returns a typed PatientSummaryOut, and the agent narrates it back in clinical English. We've verified this end-to-end with the John Doe sample patient and observed it in the live Fly logs.

The agent loop as an arrow. The agent.Loop combinator is the clearest demonstration of weft's "role-erasure" claim. Traverse doesn't know it's running agents. The agents don't know they're being traversed. Every layer only sees its argument's type contract. We didn't expect this to feel as clean as it does in practice.

The operator console. ~700 lines of single-file React (no build step, React + Babel via CDN), embedded into a Go binary via go:embed. The console fires a real request against the deployed server and renders an animated waterfall showing the request → trace → typed result flow. The two FHIR reads appear as overlapping orange bars — that's weft.Par's parallelism made visible. We built it specifically as a teaching aid, and it doubles as the most cinematic part of the demo.

68 tests, race-clean, 80%+ coverage average. For a hackathon submission, this is unusual — and it matters because the judging criteria include feasibility. A clinician (or their CTO) looking at this repo can see a test suite that exercises the real protocol, not a demo held together with prayers. The headline test is in cmd/suture-server/main_test.go: it spins up a real HTTP MCP server, simulates Prompt Opinion's exact calling pattern with the real headers from their spec, and verifies the full stack from header extraction through the FHIR client. If that test passes, the integration is real — and we've now also confirmed it against the actual platform.

What we learned

A small algebra beats a big framework. We didn't import a multi-agent orchestration framework. We imported weft, which is ~2,000 LOC, and built everything else from Arrow, Compose, Par, Pipe3, Traverse, Apply. The result is more flexible than what a heavier framework would have produced, because every behavior is just function composition — there's no framework opinion to fight.

The platform contract is the actual problem. Once the FHIR context middleware is right, every other healthcare AI tool you write becomes trivial. The hackathon's framing — that "the platform handles the plumbing problems that typically consume most of the engineering effort" — is real, and the cleanest implementation is one file of typed context injection. Getting the spec right matters more than getting the algebra clever.

MCP is genuinely small. The protocol is JSON-RPC over HTTP with three methods (initialize, tools/list, tools/call). Writing a server from scratch took ~350 LOC including transports, capability extensions, and CORS — and gave us a complete understanding of the wire format. That's leverage worth having when debugging in production.

Type-driven composition catches bugs before tests do. Several times during development, we refactored an arrow's input or output type and the Go compiler told us exactly which downstream tool stopped compiling. Type information flowing end-to-end through generic composition is a real productivity multiplier — no test can give you that kind of immediate feedback.

Watching the integration come alive is the demo. The most compelling part of the build wasn't writing the tools or the tests. It was the moment we registered Suture with Prompt Opinion, flipped on the FHIR Context Extension, authorized the SMART scopes, and saw the platform parse our initialize response and present the four scope names back to us in their UI — patient/Condition.rs, patient/Patient.rs, patient/Encounter.rs, patient/Observation.rs — each labeled "Required" or optional exactly as our Go code had declared them. The contract worked because both sides spoke the same wire shape.

What's next for Suture

More superpowers. The same building blocks support a long tail of useful tools: medication reconciliation (compare home meds vs. inpatient orders), problem-list grooming (flag stale conditions), allergy cross-checks against active prescriptions, lab-trend analysis (is this hemoglobin actually trending down or just noisy?). Each is one file under pkg/tools/ that composes existing FHIR arrows in a new way.

Production-grade LLM seam. The current implementation pins Anthropic Claude via weft/llm (for prior_auth_assistant's internal loop) and uses Prompt Opinion's Gemini for the orchestration. The Arrow[llm.Prompt, llm.Response] interface is provider-neutral by design — adding OpenAI, Gemini, or local-model support is one new file in weft/llm/ and a one-line swap at the call site. We'd expose this as a LLM_PROVIDER env-var dispatch.

Long-running workflows via Temporal. Real prior authorization isn't a 60-second LLM call — it's a multi-day workflow involving payer submission, denials, appeals, and human review. The agent loop in Suture is structured to drop into a Temporal workflow: each tool call becomes an activity, the conversation history becomes durable workflow state, human review becomes a signal. The MCP tool stays synchronous (returns a tracking ID); the workflow runs for days behind it.

Multi-agent orchestration. A supervisor agent spawning specialist children (cardiology, pharmacy, documentation), each running their own loop, fanning back in for synthesis. The shape is in the weft README's "compose multiple specialist agents" thesis; the implementation is a child-workflow pattern we'd build on top of the existing agent.Loop.

A FHIR-aware code-generation layer. The current FHIR client uses map[string]any because hand-typing every FHIR resource is not the value the project demonstrates. A v2 would generate typed Go structs from the FHIR StructureDefinitions, making the arrows fully type-safe end-to-end without sacrificing the composition story.

Publishing to the marketplace. The hackathon submission is one binary exposing five tools. The natural follow-up is to publish each tool as its own marketplace entry so other agents can compose them — turning Suture into not just an agent but a library of agents. That's the ecosystem story the platform is built for.


Built With

Share this project:

Updates