Inspiration
Earlier this year, one of our teammates volunteered with the Pancreatic Cancer Action Network (PanCAN).
She met survivors, caregivers, and families who had lost loved ones. What stood out was how often the diagnosis came too late: pancreatic cancer is the silent killer, with vague early symptoms, no standard early screening, and a survival rate of only 5% when diagnosed late (usually at stage IV).
But here’s the hopeful part: when caught early, survival jumps to 34%. That single number inspired us. If technology could shorten the diagnostic delay, thousands of lives could be saved.
From that inspiration, PancreX — the Pancreatic Cancer Clinical Copilot — was born. Our mission: build a tool that catches what doctors might miss, reduces administrative delays, and empowers patients with clarity.
What We Learned
This project forced us to learn across medicine, AI, and systems engineering:
Medical insight: We dove into clinical guidelines, survival data, and NCCN references. We learned how new-onset diabetes after age 50 is a red flag — often dismissed — but actually linked to pancreatic cancer in 85% of cases.
AI multi-agent design: Inspired by frameworks like Mastra, we built six specialized AI agents (Diagnostic, Treatment, Productivity, Care Coordination, Lifestyle, Clinical Insights). Coordinating them taught us the challenges of orchestration, query routing, and real-time reasoning.
Productivity in healthcare: Doctors spend 50% of their time on paperwork. By automating referrals, prior authorizations, and note generation, we learned how impactful workflow tools can be.
API mastery: We integrated Google Gemini 2.5 Flash for medical reasoning, OCR for document parsing, and Google Calendar APIs for scheduling. Understanding the quirks of different Google APIs was a crash course in persistence.
System thinking: We discovered how critical it is to build with both clinician and patient perspectives in mind. Our dual-mode UI — switching between professional dashboard and empathetic explanations — was a huge learning point in user-centered design.
How we Built it
Our architecture balances speed, modularity, and medical realism:
Frontend: React 18 + TailwindCSS. Features include a drag-and-drop customizable dashboard, provider vs. patient dual views, and real-time AI chat with voice recognition.
Backend: Node.js/Express with Socket.IO for real-time collaboration.
AI Integration:
- Google Gemini 2.5 Flash for evidence synthesis and patient communication.
- OCR pipeline for extracting text from medical PDFs/scans with ~98% accuracy.
- Risk scoring algorithm (age, symptoms, family history, CA19-9 biomarker) implemented in
server.js.
- Google Gemini 2.5 Flash for evidence synthesis and patient communication.
Multi-Agent System: A Mastra-inspired framework that routes queries to the right specialist agent. Example: if a provider types “bottleneck”, the query goes to Alex Chen (Productivity Agent); if “risk”, it goes to Dr. Sarah Chen (Diagnostic Agent).
Collaboration Tools: Providers can join the same patient session and annotate in real time, simulating a tumor board discussion.
Database: JSON patient profiles + mock EMR schema. Example case: Sarah Johnson, 65, new-onset diabetes + weight loss → scored HIGH risk.
Integrations: Google Calendar scheduling, ICD-10/CPT code mapping for billing, trial-matching from open registries.
What it does
One-sentence
PancreaX is an AI multi-agent web app that catches pancreatic cancer earlier by flagging high-risk patients, explains why in plain terms, attaches just-enough evidence, and removes the busywork between suspicion and action.
End-to-end flow (Clinician)
- See the signal. Dashboard lists patients; high-risk cases show a red badge.
- Open the case. The Risk Panel displays:
- Risk level (
LOW / MEDIUM / HIGH) - Why-flagged chips (e.g., age \(> 60\), new-onset diabetes, 5 kg weight loss, FHx pancreatic cancer, CA19-9 \(> 37\))
- Next actions with timing (e.g., order CT A/P w/ contrast now; GI/EUS in \(\leq 48\) h; labs today)
- Risk level (
- Get the “why” + evidence. Click Evidence to view 2–3 tailored summaries (clinician mode); switch to Patient View for empathetic explanations and expectations.
- Do the work in one click.
- Referral draft (GI/EUS) and prior-auth rationale with ICD-10/CPT and estimated costs
- Progress-note scaffold (copy-paste into EMR)
- Calendar draft events/reminders for imaging and consults
- Referral draft (GI/EUS) and prior-auth rationale with ICD-10/CPT and estimated costs
- Collaborate live. Another provider joins, annotates, and accepts the plan; all actions are timestamped and logged.
End-to-end flow (Patient)
- Patient View converts the clinician plan into clear language: what the risk means, why imaging matters, what to expect, and a checklist (prep, labs, questions to ask).
- The AI chat answers common questions (symptoms, nutrition, enzymes) and can send opt-in reminders for appointments/meds.
How the engine works (Transparent by Design)
Risk Score (Deterministic):
$$ s = w_1[\text{age} > 60] + w_2[\text{new diabetes}] + w_3[\text{weight loss}] + w_4[\text{family history}] + w_5[\text{CA19-9} > 37] $$
$$ \text{LOW: } s < 3, \quad \text{MEDIUM: } 3 \leq s \leq 5, \quad \text{HIGH: } s \geq 6 $$
We show the contributing features as chips; thresholds are visible and clinician-tunable.
Multi-agent orchestration: Router sends diagnostic queries to Dr. Sarah Chen (Diagnostic), therapy questions to Dr. Michael Rodriguez (Treatment), paperwork to Alex Chen (Productivity), scheduling to Jennifer Thompson (Care Coordinator), etc. Outputs are merged into one plan.
OCR to facts: We extract values (e.g., CA19-9), show them for confirmation, and feed them into the engine — closing the loop from document → decision.
Guardrails: Human-in-the-loop for every order; override logging; degraded mode (rules only) if LLM/evidence fails.
What it produces (Concrete Artifacts)
- Risk banner + why-flagged
- Action list with timing
- Clinician summary (+ codes) and patient summary (plain language)
- Referral & prior-auth drafts (copy-paste ready)
- Calendar events/reminders
- Team annotations & audit log
Why it matters
Instead of a vague “consider pancreas,” PancreaX gives the right information to the right person at the right time, bundled with the work needed to act. This shortens time-to-imaging, reduces cognitive and administrative load, and increases the odds of catching pancreatic cancer while it is still treatable.
Challenges we ran into
- API & Framework Conflicts
At first, our Gemini API calls wouldn’t connect properly with the Node backend. We had to rebuild parts of the request pipeline and adjust the token handling to avoid crashes.
Google Gemini APIs and Google Calendar APIs worked very differently — syncing them required creating separate service layers.
Feature Fragility
Every time we introduced a new feature, something else would break. Debugging corrupted files and rebuilding the integration flow was frustrating, but it taught us resiliency.
OCR Integration
Balancing speed vs. accuracy was tough. We optimized our OCR pipeline to hit ~98% accuracy while still returning summaries instantly for clinical workflows.
Multi-Agent Coordination
Routing queries correctly between six AI agents was a challenge. We had to design an intelligent query router that could map the provider language to the right specialist.
User Experience
Switching between clinician mode and patient mode seamlessly was not trivial. We had to rethink the UI/UX so both audiences got value from the same data without confusion.
Accomplishments that we're proud of
A working end-to-end copilot, not a slideware demo.
- In a single flow we: flag a high-risk patient → show why → surface evidence → auto-generate referral & prior auth text with ICD-10/CPT → push follow-ups to calendar → mirror the plan in a patient-friendly view.
- In a single flow we: flag a high-risk patient → show why → surface evidence → auto-generate referral & prior auth text with ICD-10/CPT → push follow-ups to calendar → mirror the plan in a patient-friendly view.
Explainable risk engine.
- An auditable, deterministic score (age, new-onset diabetes, weight loss, family history, CA19-9) with visible “why-flagged” chips and graded thresholds (LOW/MEDIUM/HIGH). No black box.
- An auditable, deterministic score (age, new-onset diabetes, weight loss, family history, CA19-9) with visible “why-flagged” chips and graded thresholds (LOW/MEDIUM/HIGH). No black box.
Six-agent orchestration.
- A Mastra-inspired router that fans out to Diagnostic, Treatment, Productivity, Care Coordinator, Lifestyle, and Clinical Insights agents, then merges outputs into a single, actionable plan.
- A Mastra-inspired router that fans out to Diagnostic, Treatment, Productivity, Care Coordinator, Lifestyle, and Clinical Insights agents, then merges outputs into a single, actionable plan.
Admin automation that actually saves time.
- One-click drafts for GI/EUS referrals, prior-auth rationales, and progress-note scaffolds. We instrumented the actions and can show an estimated ~33 minutes saved per patient.
- One-click drafts for GI/EUS referrals, prior-auth rationales, and progress-note scaffolds. We instrumented the actions and can show an estimated ~33 minutes saved per patient.
Dual-mode UX (clinician ↔ patient).
- The exact same facts render as technical guidance for providers and plain-language explanations for patients — instantly switchable in the UI.
- The exact same facts render as technical guidance for providers and plain-language explanations for patients — instantly switchable in the UI.
Real-time collaboration.
- Socket.IO sessions let multiple providers co-review, annotate, and accept plans — “tumor board in a box.”
- Socket.IO sessions let multiple providers co-review, annotate, and accept plans — “tumor board in a box.”
OCR → structured facts → risk.
- We ingest PDFs/scans, extract key values (e.g., CA19-9 85 U/mL), ask the user to confirm, then feed them into the engine — closing the loop from document to decision.
- We ingest PDFs/scans, extract key values (e.g., CA19-9 85 U/mL), ask the user to confirm, then feed them into the engine — closing the loop from document to decision.
Resilience & safety rails.
- If LLM/evidence fetch fails, we degrade to rule-based recommendations and static guidance; every suggestion is human-in-the-loop and override-logged.
- If LLM/evidence fetch fails, we degrade to rule-based recommendations and static guidance; every suggestion is human-in-the-loop and override-logged.
Hard-won API plumbing.
- We solved auth/CORS quirks between Gemini, Node, and Google Calendar (separate service layers, retry/backoff, token refresh) so the demo flow is reliable.
What we learned
Clinical nuance matters.
- New-onset diabetes >50 + unintentional weight loss is a powerful red-flag pattern, but CA19-9 alone is not reliable for screening. The copilot must weigh combinations and expose that logic to clinicians.
- New-onset diabetes >50 + unintentional weight loss is a powerful red-flag pattern, but CA19-9 alone is not reliable for screening. The copilot must weigh combinations and expose that logic to clinicians.
Trust beats raw “AI.”
- Providers want to see why an alert fired and what the next two steps are. Explainability, graded thresholds, and editable outputs build trust; “FYI risk” popups do not.
- Providers want to see why an alert fired and what the next two steps are. Explainability, graded thresholds, and editable outputs build trust; “FYI risk” popups do not.
Agent routing > single omnibot.
- Specialized agents (diagnostic vs. productivity vs. coordination) reduce prompt sprawl, enable clearer guardrails, and make failures easier to recover from.
- Specialized agents (diagnostic vs. productivity vs. coordination) reduce prompt sprawl, enable clearer guardrails, and make failures easier to recover from.
Prompt engineering is product engineering.
- We built role-conditioned prompts (clinician vs. patient), capped lengths, and required “Key takeaways + confidence notes” to curb drift and keep summaries clinically useful.
- We built role-conditioned prompts (clinician vs. patient), capped lengths, and required “Key takeaways + confidence notes” to curb drift and keep summaries clinically useful.
APIs are all different (even within Google).
- Gemini vs. Calendar required different auth models and error semantics; isolating them behind service modules prevented cascading failures.
- Gemini vs. Calendar required different auth models and error semantics; isolating them behind service modules prevented cascading failures.
Design for the Five Rights of CDS.
- Right info, person, format, channel, time. Alerts fire during the decision window, come with codes/referrals, and land in the clinician’s existing workflow (not yet another inbox).
- Right info, person, format, channel, time. Alerts fire during the decision window, come with codes/referrals, and land in the clinician’s existing workflow (not yet another inbox).
Measure both time and care quality.
- Minutes saved are table stakes; we also defined KPIs for alert acceptance rate, time from flag → order, and (in pilots) % imaged within 72 h and stage at detection.
- Minutes saved are table stakes; we also defined KPIs for alert acceptance rate, time from flag → order, and (in pilots) % imaged within 72 h and stage at detection.
Dual-audience writing is a skill.
- Translating the same facts for a patient without losing clinical fidelity required iterations and user-testing language patterns (definitions, expectations, next steps).
- Translating the same facts for a patient without losing clinical fidelity required iterations and user-testing language patterns (definitions, expectations, next steps).
What's next for PancreaX
Clinical validation (retrospective → prospective).
- Retrospective EMR cohort: compute sensitivity/PPV, calibration, and override rates.
- Prospective pilot in primary care/new-diabetes clinics: measure time-to-imaging and stage at detection.
- Retrospective EMR cohort: compute sensitivity/PPV, calibration, and override rates.
FHIR/SMART on FHIR integration.
- Pull problems, meds, labs (A1c, weight trends) and push notes/referrals into real EMRs; add RBAC, audit trails, and PHI encryption at rest/in transit.
- Pull problems, meds, labs (A1c, weight trends) and push notes/referrals into real EMRs; add RBAC, audit trails, and PHI encryption at rest/in transit.
Model upgrade + explainability.
- Replace rules with a calibrated ML model (e.g., XGBoost or logistic regression with Platt scaling), expose SHAP-style feature attributions, and keep thresholds clinician-tunable.
- Replace rules with a calibrated ML model (e.g., XGBoost or logistic regression with Platt scaling), expose SHAP-style feature attributions, and keep thresholds clinician-tunable.
Imaging and radiomics.
- Add CT/EUS connectors; experiment with basic radiomic features and pre-trained detectors; keep humans in the loop for any imaging inference.
- Add CT/EUS connectors; experiment with basic radiomic features and pre-trained detectors; keep humans in the loop for any imaging inference.
Live trial matching.
- Integrate clinicaltrials.gov with NLP eligibility parsing, geo-filters, and one-click outreach templates for patient-consented matches.
- Integrate clinicaltrials.gov with NLP eligibility parsing, geo-filters, and one-click outreach templates for patient-consented matches.
Equity & fairness.
- Bias checks across age, sex, and race; multi-lingual patient view (ES, HI). Offer low-data modes for rural clinics with limited connectivity.
- Bias checks across age, sex, and race; multi-lingual patient view (ES, HI). Offer low-data modes for rural clinics with limited connectivity.
Security & compliance.
- SOC 2-ready logging, key management (KMS), scoped tokens, and incident response runbooks; HIPAA BAAs for pilot partners.
- SOC 2-ready logging, key management (KMS), scoped tokens, and incident response runbooks; HIPAA BAAs for pilot partners.
Productization.
- Mobile companion app, voice UI for hands-busy clinicians, and an “outbox” that packages referral/PA/note into a single push to the EMR.
- Mobile companion app, voice UI for hands-busy clinicians, and an “outbox” that packages referral/PA/note into a single push to the EMR.
Generalization.
- Reuse the agent framework for other high-value conditions (lung nodules, ovarian cancer, GI bleed) — same productivity + safety philosophy.
- Reuse the agent framework for other high-value conditions (lung nodules, ovarian cancer, GI bleed) — same productivity + safety philosophy.

Log in or sign up for Devpost to join the conversation.