Inspiration

  • Ambient AI has genuinely transformed clinical documentation so doctors spend less time typing and more time with patients
  • But over the past few years we observed a gap the note was accurate, the coding was incomplete
  • A patient with diabetes and leg swelling would have a perfect transcript and a missing kidney disease code worth thousands in risk-adjusted revenue
  • We realized ambient AI solves the transcription problem but leaves the semantic problem
  • No tool was asking: what do these symptoms mean together, and is that meaning captured in the billing record?
  • That question is what inspired SymptoMap

What It Does

  • SymptoMap is a post-transcript enrichment layer that plugs into any ambient AI clinical documentation tool
  • It receives a structured clinical note after the ambient AI generates it
  • It identifies the medical concepts in the note and maps them to standardized clinical identifiers
  • It then traverses a clinical knowledge graph to find relationships between those concepts that no single symptom lookup would reveal
  • It applies a clinical terminology standardization layer the kind used by professional medical coders and auditors to ensure the right language is used for each concept (think: the difference between "high blood sugar" and "Type 2 Diabetes Mellitus with hyperglycemia, uncontrolled")
  • It returns a structured result containing:
    • ✅ Validated billing codes already in the note
    • 🔴 Flagged missing codes with the knowledge graph reasoning path that found them
    • 📋 A complete audit trail showing why each suggestion was made
  • The clinician sees nothing change enrichment happens invisibly before the note enters the health record

How We Built It

  • API layer: FastAPI — lightweight, fast, self-documenting; the Swagger UI at /docs is the demo interface
  • Knowledge graph: NetworkX in-memory graph loaded at startup — approximately 150 nodes and 400 edges across three clinical domains (diabetes, cardiovascular, metabolic), curated from open datasets including PrimeKG (Harvard), SNOMED CT subsets, and ICD-10-CM
  • Entity extraction: A curated medical dictionary mapped to UMLS concept identifiers — no heavy machine learning model downloads, keeping demo latency under 300ms
  • Terminology layer: A standardized clinical terminology database (the kind maintained by professional medical coding organizations) mocked for the demo and designed to activate from a real API with a single environment variable — zero code changes required
  • Testing: 32 red-green TDD test cases written before any implementation — pytest across models, NER, graph traversal, enrichment pipeline, and API endpoints
  • Demo scenarios: Three pre-built fixture notes covering a clean note (no gaps), a gap note (missing HCC codes), and a partial note (mixed result)
  • Architecture decision: Deliberately avoided Neo4j, scispaCy, and any heavy infrastructure — the demo runs on a laptop with no external dependencies

Challenges We Ran Into

  • The "1.47% problem": Early in research we conflated transcription hallucination rates with the semantic gap we were solving these are completely different problems measured differently; getting the framing right took real work
  • Knowledge graph scope vs. latency: Full PrimeKG has 4 million edges loading it into memory would have made the demo unusable; the discipline of curating 150 nodes that still told a compelling clinical story was harder than expected
  • CDI vs. CDSS positioning: Clinical Documentation Improvement and Clinical Decision Support are legally and regulatorily very different categories; we had to be precise that SymptoMap enriches documentation of what was already discussed it does not recommend treatments or diagnoses
  • Terminology standardization without a live API: Building a mock that was realistic enough to demonstrate the concept while being architected for real API activation required careful interface design
  • Demo data: Synthetic clinical notes that are realistic enough to be credible but contain no real patient information finding the right sources and constructing three notes that each told a different story took iteration

Accomplishments That We're Proud Of

  • The entity-cluster insight: The core idea that two symptoms together imply a third concept and that this relationship is traversable in a knowledge graph turned out to be both technically sound and immediately understandable to non-technical judges
  • 32 passing tests before a single line of production code: Full red-green TDD discipline on a hackathon timeline is genuinely difficult; we held the line
  • Research grounding: Every number in the presentation traces to a peer-reviewed source or primary data survey no interpolated figures, no vendor claims
  • Sub-300ms enrichment on a laptop: The performance constraint felt arbitrary at first; it turned out to be the right forcing function to make architectural choices that will scale
  • The patient safety framing: Realizing that an undocumented kidney disease diagnosis is not just a revenue gap but a medication contraindication safety issue and articulating that clearly felt like the moment the project became genuinely meaningful

What We Learned

  • Transcription accuracy and semantic completeness are orthogonal problems — solving one does not move the needle on the other; this distinction reshapes how you think about the entire ambient AI stack
  • Knowledge graphs are most powerful at the cluster level — individual entity lookup is just search; the value emerges when you traverse relationships between entities together
  • CDI is a CFO problem, not a CMO problem — the sales cycle, the ROI language, and the procurement path are completely different from clinical decision support; positioning matters more than we expected
  • Curated data beats comprehensive data for demos — 150 hand-verified nodes produced more reliable and explainable results than any automated extraction from a full ontology would have
  • The audit trail is also the product — we initially treated the KG traversal path as metadata; Audit trails are also part of the product; the reasoning transparency is what makes the enrichment trustworthy

What's Next for Clinical Notes Enricher

  • Connect to a real ambient AI webhook: The architecture is already built around the standard notification pattern used by major ambient AI platforms — the next step is early access partnership to run against real note output
  • Replace the terminology mock with a live API: The environment variable pattern means this is one configuration change; the interface is already designed and tested
  • Expand the knowledge graph to five clinical domains: Current coverage is diabetes, cardiovascular, and metabolic — respiratory and mental health are the next highest-impact additions for HCC risk adjustment
  • Build the population-level layer: A health system's own patient population data as a graph overlay — "your diabetic patients over 65 historically miss CKD documentation 34% of the time" — this is the enterprise upsell that no open ontology can provide
  • Pursue Neo4j migration: NetworkX in-memory works for the demo; a 5,000-node graph with full PrimeKG coverage needs a proper graph database; the query patterns are already written in a way that makes migration straightforward

Built With

  • ambient-ai
  • cdi
  • claude-code
  • clinical-ontology
  • fastapi
  • graph-traversal
  • hcc-coding
  • hl7-fhir
  • httpx
  • icd-10-cm
  • imo
  • knowledge-graph
  • loinc
  • mcp
  • neo4j
  • ner
  • networkx
  • primekg
  • pydantic
  • pytest
  • python
  • rest-api
  • snomed-ct
  • umls
  • umls-cui
  • uvicorn
  • webhook
Share this project:

Updates