The Gap Forge targets

Systemic inequity shows up in software as much as it does anywhere else. The very communities with the greatest healthcare needs, that is low-income, uninsured, and immigrant populations served by safety-net clinics. They are the ones whose providers have the least access to the tools that make care delivery efficient, trackable, and accountable. Forge is a direct response to that structural gap. It empowers clinic staff to build their own tools using plain language without engineers or enterprise contracts, and without anything other than a description of what they need.

More than 30 million patients in the United States receive their primary care from Federally Qualified Health Centers and safety-net clinics. These organizations operate on federal grant margins, carry high staff turnover, and run constant streams of operational software needs like quality measure reporting, care gap tracking, billing anomaly detection, scheduling tools. These are operational needs that require custom logic no one on their team can build.

The tools that exist today do not reach this population. EHR built-in dashboards (eClinicalWorks, NextGen, Athena health) ship with 15–20 fixed quality reports. A quality coordinator with a question outside those reports has no path forward from within the system. Population health platforms like Arcadia, Innovaccer, and Health Catalyst solve this problem for large integrated health systems but requires six-month long implementations, IT-heavy onboarding, enterprise pricing that a safety-net clinic cannot approach. Microsoft's natural language patient cohort builder in Microsoft Fabric was discontinued in November 2025. Epic's NL query layer exists only for Epic customers, which is a minority of FQHCs.

What remains is manual Excel exports, brittle volunteer-built spreadsheets, or outside consultants charging $5,000–$50,000 per engagement for tools the clinic needs every reporting cycle. The tools that would fix this never get built because each one is individually too small to justify engineering the clinic doesn't have.

Consider a quality coordinator at a community health center. Her federal Uniform Data System (UDS) report is due in six weeks, and she has no data analyst or IT team on call.

She types into Forge: "Show me diabetic patients who haven't had an A1c in the last six months."

Forge synthesizes one from scratch. Once she approves the verified logic, 47 patient records appear in under a minute, and the tool installs permanently into the clinic's infrastructure.

Three weeks later, a care manager types a slightly different phrasing of that same exact need. Forge recognizes the intent of the request and finds the original tool at a 91% similarity match, skips the synthesis phase entirely, and returns the live, updated patient list in two seconds.

By skipping the synthesis phase, Forge is able to save the clinic on thousands of tokens that would otherwise go toward creating a redundant tool whose capability has already been implemented in another tool.

The clinic's software library grows automatically with every question asked, improving the performance of the brain that creates these tools, with Forge wrapping around the clinic's/health department's infrastructure.

What Forge Builds

Forge is a self-building operational tooling system. A clinic coordinator describes what they need in plain language. Forge synthesizes a verified, executable capability, installs it as a permanent reusable tool in the clinic's library, runs it against the clinic's patient data, and returns the result, all in under a minute without an engineer.

The mechanism is a six-stage pipeline:

Route. Every incoming intent is embedded using OpenAI's text-embedding-3-small model and compared against the clinic's existing capability library via HNSW vector similarity search in Redis. If a semantically equivalent tool already exists (cosine similarity ≥ 0.85), the system routes directly to it and executes without synthesis — near-zero token cost, sub-second latency.

Retrieve adjacent context. When routing misses, the same embedding — computed once and carried forward — queries the adjacent band of the vector space (similarity 0.5–0.84). Capabilities in that band are retrieved and injected into the synthesis prompt as proven patterns. A new capability asking about hypertensive patients benefits from context pulled from an already-proven diabetes denominator construction that shares the same structural shape.

Synthesize. Claude Opus 4 generates executable Python logic against the clinic's typed data layer, primed with any retrieved adjacent patterns and constrained by a system prompt that explicitly marks user input as untrusted data.

Verify. Two-stage verification before any logic runs on patient data. An AST walker structurally proves the generated code: zero imports, zero dunder attribute accesses, all data layer calls on a named allowlist, all name references either locally defined or explicitly permitted. Sandbox execution against mock data then confirms the function compiles, executes cleanly, and returns the declared output shape.

Approve. An async gate pauses the build loop. The verified logic is presented for human review. No AI-generated code touches real patient data until a human releases the gate.

Install and execute. The capability is stored in Redis as a permanent bundle with its embedding indexed in the vector store, then executed immediately. Every future request that semantically matches this tool routes to it directly.

The Compounding Intelligence Layer

The core architectural property of Forge is that the system gets more capable and more efficient as it is used. This is a direct consequence of how the vector index, embedding model, and synthesis pipeline interact.

When the clinic builds its first capability, the vector index has one entry. By the tenth capability, a new request arrives with a rich context band of adjacent patterns that the synthesizer uses to produce more accurate, more structurally sound logic, faster. The built_from field on each installed manifest records exactly which prior capabilities were retrieved and used as synthesis context as a permanent provenance chain stored in RedisJSON.

Token economics follow the same compounding logic. A fresh synthesis costs roughly 2,400–3,000 tokens of Claude Opus inference. A reuse costs an OpenAI embedding call and a Redis vector search. Across six monthly reporting cycles, re-synthesizing from scratch every time scales linearly in cost. Forge's build-reuse model flattens after the first build and stays flat. The efficiency ratio grows with every reuse event.

Intelligence Metrics Dashboard

The Intelligence Dashboard runs as a standalone application alongside the main demo, making Forge's Redis and Anthropic infrastructure directly observable in real time.

Every query Forge receives triggers a live vector search across the capability library using RedisVL's HNSW index. The Routing Log captures each decision: the cosine similarity score returned, the intent text that triggered it, and the outcome. When similarity clears the threshold, Redis routes directly to the existing capability, the query executes instantly, and no Claude inference happens at all. When it misses, the adjacent similarity band is queried to retrieve related capabilities as synthesis context, Claude Opus 4 builds a new one, and the token cost is recorded from the live SSE payload. This is Redis acting as agent memory: the vector index holds the clinic's accumulated operational knowledge, and every incoming query is matched against it before any model is called.

The Session Stats panel surfaces the token economics this produces in running totals: tokens spent on synthesis, tokens saved through dynamic capability reuse, and a live reuse rate that climbs as the library grows. These numbers are not projected. They are counted from real payloads across real queries in the session.

The Capability Provenance section shows what Redis stores beyond the routing layer. Each installed capability carries its full build trace, AST verification facts, and a built_from lineage recording which prior capabilities the HNSW adjacent retrieval pulled in as synthesis context during the build. That lineage is what makes Forge compound: Claude synthesizes each new capability with awareness of proven patterns already in the Redis registry, so the system builds on what it knows rather than re-deriving from scratch.

Taken together, the dashboard shows Redis doing three distinct jobs simultaneously: vector search for semantic routing, persistent JSON storage as the capability registry and agent memory, and Streams as the live event bus powering every stage of the build.

The dashboard makes that structural dependency visible, and shows in live numbers how it translates directly into reduced token usage, real-time dynamic capability selection, and a system that gets more efficient the more it is used.

Anthropic Track: Building with Claude for Social Impact

Claude Code as the Development Environment

The entire Forge system was built in about a 24-hour window using Claude Code as the primary development tool between the two of us (Steven and Arjun). Claude Code held the full system architecture in context across files by maintaining schema consistency between the Python backend and TypeScript frontend, catching cross-module contract violations, and reference implementations directly from the PRD specification.

Claude Opus 4 and Sonnet 4.6 in the Synthesis Loop

Claude Opus 4 (claude-opus-4-8) handles capability synthesis, which is the step requiring genuine clinical reasoning about what LOINC codes map to A1c observations, what SNOMED codes identify hypertension, and what the correct denominator construction logic looks like for a given care gap query.

Claude Sonnet 4.6 (claude-sonnet-4-6) handles routing decisions making them lighter and faster, whcih is appropriate for the embedding-based similarity matching task.

Ethical Architecture for Clinical AI

Privacy is enforced at the AST level where the generated logic can only access the data sources declared in its manifest, verified structurally before any execution against patient data. The human approval gate means no AI-generated code runs on real patient records without explicit human release. Forge does not make care decisions. It generates operational tooling for the staff who support care decisions like a quality coordinator, a care manager, a billing lead. Every output is a patient list for human action, not a clinical recommendation.

Environmental impact is reduced structurally by the reuse mechanism. Every routing hit to an existing capability is an avoided Claude Opus inference call. As the capability library grows, the system does more work with less compute per question answered.

The evaluation pipeline uses Synthea synthetic patients exclusively. No PHI was used anywhere in the build, test, or validation process.

Redis Track: Three Pillars, One Backbone

Redis is not an add-on to Forge. The product cannot function without it. Three Redis capabilities are deployed simultaneously, each handling a distinct and load-bearing role.

Pillar 1: Vector Search — Semantic Routing and Compounding Retrieval

Every installed capability's description is embedded with text-embedding-3-small (1,536 dimensions) and indexed in a RedisVL HNSW vector index under cosine similarity. Incoming intents are embedded with the same model and searched against this index in milliseconds.

The HNSW algorithm provides approximate nearest-neighbor search in O(log n) time. Cosine similarity measures semantic alignment between query and capability embeddings. For example a "show me diabetic patients with poor A1c" maps within 0.09 cosine distance of "which diabetes patients have A1c above 9%", routing both to the same installed tool without re-synthesis.

The query embedding is computed once at routing time and carried forward for the adjacent band retrieval which is one embedding API call per intent regardless of downstream uses.

Pillar 2: RedisJSON — Capability Registry and Provenance

Each capability's complete bundle which includes the manifest, generated logic, ui_spec, and verification status is stored as a JSON document in Redis, co-located with its embedding under the same key prefix. Redis is the registry, not a cache in front of a database. Reuse counting is an atomic JSON numeric increment. Provenance chains recording which prior capabilities informed each synthesis are stored on the manifest and queryable at any time.

Pillar 3: Redis Streams — Live Event Bus

Every build loop stage transition emits a BuildEvent to a Redis Stream. The FastAPI SSE endpoint polls this stream and forwards matching events to the browser as Server-Sent Events. This is the mechanism behind the live build choreography — every stage (routing, gap, synthesizing, verified, approved, installed, executing, done) is a real Redis Stream entry consumed in real time by the frontend. The intelligence dashboard's live stream panel shows these entries arriving as the demo runs.

The Three Roles Together

Role	Redis Feature	Remove It and...
Semantic routing and compounding retrieval	RedisVL HNSW vector index	Every query re-synthesizes from scratch, every time
Capability registry and provenance	RedisJSON	No persistent library; clinic accumulates nothing
Live build choreography	Redis Streams	Frontend has no visibility; approval gate has no mechanism

These three use cases map to the three Redis patterns foundational to production AI infrastructure: vector search for grounding responses in real data, semantic routing as a form of semantic caching that skips expensive inference when the answer already exists, and persistent searchable agent memory across all sessions.

Technical Stack

Layer	Technology
Capability synthesis	Claude Opus 4 (`claude-opus-4-8`)
Routing / coordination	Claude Sonnet 4.6 (`claude-sonnet-4-6`)
Semantic embeddings	OpenAI `text-embedding-3-small` (1,536-dim)
Vector index	RedisVL (HNSW, cosine similarity)
Capability registry	RedisJSON
Event bus	Redis Streams
Backend	Python 3.11 + FastAPI + asyncpg
Domain data	Synthea CSV → Postgres
Eval tracing	Arize (OpenTelemetry)
Frontend	React 18 + TypeScript + Tailwind CSS
Build tooling	Claude Code (Anthropic CLI)

The Broader Purpose

Forge is infrastructure for the long tail of communities that have always been told their needs are too small to justify engineering. Tens of millions of patients rely on safety-net clinics as their primary access to healthcare. Those clinics deserve the same operational software capabilities that well-funded hospital systems take for granted, like tools built from their own needs, verified before they touch patient data, permanent so knowledge doesn't walk out the door when staff turns over, and cheap enough to run every month without a second thought.

That is what Forge builds toward. Every capability installed is a piece of infrastructure the clinic permanently owns. Every reuse is a tool running correctly without anyone having to rebuild it. Every month the library grows, the system gets smarter about that clinic's specific data, and the cost of the next question gets lower. The infrastructure compounds because the communities it serves needs it to.