Inspiration
Every year, preventable medication errors kill an
estimated 100,000+ Americans and injure millions more. A
huge fraction of those errors come from something absurdly
simple: a doctor prescribes a drug without remembering
(or ever being told) that the patient is allergic to it,
already on a conflicting medication, or has a condition
that makes the drug dangerous. Telehealth makes this
worse, because visits are short, doctors see patients
they've never met, and there's no nurse flipping through a
paper chart to catch mistakes.
We wanted to build the thing every doctor wishes existed:
a quiet assistant that listens to the visit, writes the
chart, codes the orders, and yells before anything
dangerous gets prescribed. No extra clicks, no extra
typing.
## What it does
MediCode listens to a live telehealth conversation and,
within seconds of the call ending, it transcribes the
audio with speaker diarization so we know which line came
from the doctor and which came from the patient. From
there it pulls out structured diagnoses, prescriptions,
lab orders, and referrals, assigns ICD-10 codes to the
diagnoses and LOINC codes to the labs, and writes a full
SOAP note. Then the important part: it cross-references
every new prescription against the patient's entire
encounter history plus NIH drug databases and flags drug
to drug, drug to allergy, and drug to condition conflicts
with severity badges before the doctor signs off. Finally,
it submits everything to a FHIR R4 EHR server with one
click.
Patients persist across visits, so the second time John
comes in, MediCode already knows about the warfarin he was
prescribed last month, and it will scream if a new doctor
tries to hand him aspirin.
## How we built it
The stack is a React plus Vite plus TypeScript frontend
that captures audio through the browser's MediaRecorder
API, base64 encodes it, and ships it to a Jac backend. Jac
(and the jac-cloud runtime) auto exposes every walker as
a REST endpoint, which meant we didn't have to hand write
a single API route. On the backend we use ElevenLabs
Scribe v2 for speech to text with diarization, Claude
Sonnet 4.6 (via the byllm plugin) for clinical analysis,
SOAP note generation, and conflict detection, and the NIH
RxNorm, RxNav, and DailyMed APIs for drug safety
grounding. Everything eventually flows out to a HAPI FHIR
R4 server for real EHR submission.
Everything lives in a Jaseci graph. A Patient node has
edges to all their Encounter nodes, and each encounter
fans out to diagnoses, prescriptions, labs, SOAP notes,
and persisted conflict records. This is the core reason
historical conflict detection works: when you start a new
encounter, a walker traverses backward through the
patient's prior encounters and collects every active
medication before feeding them to the LLM. Trying to do
this in a relational schema would have been a nightmare of
JOINs.
The LLM layer is where Jac really shines. We used the by
llm() syntax, which lets you declare a typed function
signature and the runtime forces Claude to populate a
structured object instead of dumping freeform text. So
instead of parsing JSON strings out of prompt responses,
we literally just write def detect_conflicts(...) ->
ConflictResult by llm() and the language handles the
rest. Combined with few shot examples in the docstring,
this is what made the conflict output reliable.
For conflict scoring, we model the risk of prescribing
$p_i$ against an existing medication, allergy, or
condition $c_j$ as:
$$R(p_i, c_j) = w_{\text{severity}} \cdot s(p_i, c_j) +
w_{\text{evidence}} \cdot e(p_i, c_j)$$
where $s \in {\text{minor}, \text{moderate},
\text{major}, \text{critical}}$ and $e$ is grounded in
NIH RxNav interaction data plus FDA labeling from
DailyMed. The LLM acts as the scorer, but it's constrained
by the structured evidence we hand it, so it isn't just
hallucinating severity from its training data.
## Challenges we ran into
The first real headache was by llm() returning
placeholder strings. Our first analysis calls came back
with nonsense like "OhbVrpoiVgRV" and we stared at it
for an hour before figuring out the byllm plugin wasn't
actually installed, so the runtime was returning mock
values instead of calling Claude. One pip install byllm
later and we were back in business.
Base64 audio transport was another one. Jac walker
endpoints accept JSON, so we had to base64 encode the
browser's audio blob, declare the walker parameter as
str, and decode it back to bytes server side before
handing it to ElevenLabs. Simple in hindsight but we got
every type annotation wrong on the first try.
Claude kept dumping structured conflicts into the
warnings: list[str] field instead of the conflicts:
list[ConflictItem] field. We tried more verbose
instructions, threats, pleading, nothing worked until we
just dropped a concrete few shot example into the prompt
and then it worked on the first try.
There was also a frustrating backend to frontend field
name mismatch where the backend reported conflicts:
{items: [...]} and the frontend was looking for
conflicts: {conflicts: [...]}, which cost us an hour of
confused staring at a working backend and a broken UI.
Stale .jac/cache/ bytecode bit us too. When we added new
node types like Conflict and ConflictSummary, they
wouldn't take effect until we manually nuked the cache
folder, because the compiled .jir files were being
reused with the old schema.
The weirdest bug was the patient list flickering between
26 and 3. Turned out Jaseci has a multi root model where
here inside a walker can resolve to different root nodes
across calls, so one call would see patients attached to
one root and another call would see a totally different
set. We fixed it by anchoring every traversal on root
-->, which always means the authenticated user's root.
And finally, conflicts were originally computed in memory
and lost on page reload, which meant opening a past
encounter would show empty conflict data. We fixed that by
adding Conflict and ConflictSummary node types and
writing them to the graph during run_analysis, so
reopening a past encounter now shows the exact same safety
analysis the doctor saw live, with no LLM recall and no
cost.
## Accomplishments that we're proud of
We're really proud of getting a full end to end clinical
pipeline running inside a hackathon. Audio goes in one
end, structured codes and a FHIR submission come out the
other, and everything in between (transcription,
extraction, coding, conflict detection, SOAP generation)
is automated.
The drug safety grounding is also something we actually
care about. We're not just asking Claude "is this
dangerous?", we're cross referencing NIH RxNav, RxNorm,
and DailyMed so the flags are defensible and not
hallucinated. On top of that, historical conflict
detection actually works, meaning the second visit
automatically knows what was prescribed in the first visit
without anyone typing it in. Doctors can also reopen any
past encounter and see the exact same safety analysis that
was shown at the time of the visit, because we persist
the conflicts as graph nodes.
We also kept HIPAA in mind from day one. Only drug names
get sent to external APIs, never any PHI, and all patient
data stays in the server side graph. Nothing sensitive
ever touches localStorage or anything client side.
Oh and we actually send real FHIR resources (Patient,
Condition, MedicationRequest, ServiceRequest,
DocumentReference) to a live HAPI FHIR server, which
felt really cool the first time we saw them show up.
## What we learned
Graph databases are genuinely perfect for medical records.
Patients, encounters, and clinical artifacts form a
natural DAG, and "walk backward from this encounter to
find everything this patient is on" is literally one line
of Jac.
by llm() is a genuinely different way to build with
LLMs. You don't write prompts that return JSON strings you
then parse, you declare a typed function signature and
the language runtime handles the rest. Our entire conflict
detection function is around 30 lines of Jac, which still
feels absurd.
Few shot prompting beats verbose instructions every time.
We spent hours trying to get Claude to populate a
conflicts list instead of dumping everything into
warnings, and a single concrete example in the prompt
fixed it instantly.
Jaseci's multi root model is subtle and will bite you if
you're not careful. here inside a walker can refer to
different root nodes across calls, and we only figured
that out after watching patient counts flicker
nonsensically. Switching to root --> fixed it
permanently.
And the biggest lesson: HIPAA is a design constraint, not
an afterthought. Our first instinct for persistence was
localStorage, which was a terrible idea because PHI
can't live in plain text on the client. The right answer
is always server side.
## What's next for MediCode
Next up we want to make this voice first with no record
button, just a calendar aware assistant that joins the
call automatically. We also want to add prescriber
specific guardrails (per doctor controlled substance rules
and formulary compliance), real EHR integration with Epic
and Cerner via SMART on FHIR, and multi language
transcription for non English telehealth visits. On the
clinical side, we want to layer pediatric and geriatric
dose checking on top of the current conflict model using
age weighted safety thresholds. And eventually we'd love
to run a real validation study, where MediCode runs in
shadow mode on real visits and we measure how many
prescribing errors it would have caught.
Built With
- anthropic-api
- elevenlabs-scribe-v2
- hapi-fhir-r4
- jaseci
- react
- typescript
Log in or sign up for Devpost to join the conversation.