Pneumanosis — Chest X-ray Triage Co-Pilot

Inspiration

Radiologist burnout is a quiet crisis in healthcare — studies show it affects 44–65% of the specialty. Between 2009 and 2020, radiology workloads surged by 80%, yet there was no matching increase in staffing. Rural hospitals face an average 130-day wait just to fill a single radiology position. Meanwhile, patients with critical findings sit in growing queues, waiting for eyes on their scans.

We asked ourselves: what if AI could act as a second set of eyes — not replacing the radiologist, but making sure they see the right image at the right time? That question became Pneumanosis.

What it does

Pneumanosis is an AI-powered chest X-ray abnormality screening system that helps health professionals prioritize critical cases. Upload a chest X-ray and instantly get:

  • Multi-abnormality detection across 14 conditions (5 active, 9 in development), each with a confidence score.
  • Tier-based triage ranking that surfaces the most critical patients first, using a clinically-informed 5-tier severity system (from life-threatening to routine).
  • Grad-CAM heatmap overlays that visually highlight where on the X-ray the model is flagging — so clinicians can see why the AI made its call.
  • Side-by-side comparison for tracking a patient's progression over time or comparing two patients.
  • AI-generated explanations with clinical recommendations for each finding.

The dashboard offers four views: a Patient Overview with floating annotation cards and bezier-curve connectors to flagged regions, a Triage Queue ranked by severity, a Compare view for longitudinal or cross-patient analysis, and an Upload page with drag-and-drop instant analysis.

Pneumanosis is not a replacement for radiologists — it's a co-pilot. The doctor still makes the call. We just make sure they're looking at the right scan first.

How we built it

We split the project across four teammates, each owning a layer of the stack:

  • ML Model: We fine-tuned a DenseNet-121 on the NIH Chest X-ray dataset (112,120 frontal-view images across 30,805 patients). The model outputs multi-label binary predictions for the CheXpert Competition 5 conditions: Atelectasis, Cardiomegaly, Consolidation, Edema, and Pleural Effusion. Grad-CAM provides the explainability layer.
  • Backend API: A FastAPI server handles async inference, serving predictions through a clean REST contract (/predict, /health, /conditions). The API auto-detects whether a trained .pth model is present and switches between mock and real inference — no code changes needed.
  • Frontend Dashboard: Built with Next.js 16, React 19, and Tailwind CSS 4. Framer Motion powers smooth UI transitions, and Recharts handles data visualization. The dashboard is organized around four views for clinical workflow.
  • Infrastructure: Docker Compose spins up the full stack (dashboard + API) with a single command. Dev containers ensure a consistent environment for all contributors.

The tier ranking system was developed with clinical input, mapping each detectable condition to a severity tier informed by medical literature and hospital triage protocols.

Challenges we ran into

  • Label alignment: The NIH dataset and CheXpert use different label sets. Deciding which label scheme to standardize on — and mapping conditions to clinically meaningful tiers — took significant research and debate.
  • Model explainability vs. performance: Grad-CAM heatmaps are essential for clinical trust, but getting them to overlay cleanly on the dashboard while keeping inference fast was a balancing act.
  • Coordinating across the stack: With four teammates each owning a different layer (ML, API, frontend, clinical validation), keeping interfaces aligned — especially the API contract — required constant communication and living documentation.
  • Dataset scale: Working with 112K images (1024×1024) meant we had to be thoughtful about data loading, augmentation pipelines, and training infrastructure within hackathon time constraints.
  • Clinical credibility: We're not doctors. Getting the tier rankings, condition descriptions, and clinical recommendations right meant leaning heavily on published medical sources and having our EMS teammate validate the clinical framing.

Accomplishments that we're proud of

  • End-to-end working system in a hackathon: From model training to a polished, multi-view clinical dashboard — all connected through a clean API — in a single weekend.
  • Clinically-informed triage tiers: Our 5-tier severity system isn't arbitrary; it's backed by clinical sources and reviewed by a teammate with EMS experience.
  • Grad-CAM integration: Heatmap overlays with floating annotation cards connected by bezier curves to flagged regions. Clinicians don't just see what the AI found — they see where and why.
  • Seamless mock-to-real transition: The API automatically detects a trained model and switches from mock data to real inference. This let the frontend team build and demo without waiting on model training to finish.
  • Docker one-command deploy: docker-compose up --build gives you the entire stack. No setup headaches.

What we learned

  • AI in healthcare needs explainability first. A confidence score alone doesn't build clinical trust. Showing where on the image the model is looking (via Grad-CAM) is what makes the difference between a black box and a co-pilot.
  • Triage is a design problem, not just an ML problem. Ranking patients by severity required us to think deeply about clinical workflows — how ERs actually process cases, what "urgent" means in hours vs. days, and how to present that information without overwhelming a busy clinician.
  • Living documentation saves hackathon teams. Our DASHBOARD_DESIGN.md, Model_Hospital_Ranking.md, and PROPOSAL.md files kept everyone aligned even when we were heads-down in different parts of the codebase.
  • Mock-first API design is powerful. Defining the API contract early and building mock responses let the frontend and backend develop in parallel without blocking each other.
  • The radiologist shortage is real and urgent. Digging into the statistics — 80% workload increase, 130-day hiring gaps, missed-case rates of 44.8% — made this project feel less like a hackathon exercise and more like something that genuinely matters.

What's next for Pneumanosis

  • Expand to all 14 conditions: The current v1 model covers 5 CheXpert conditions. Nine more — including Pneumothorax, Pneumonia, Emphysema, and Nodule/Mass — are on the roadmap for the full detection suite.
  • DICOM integration: Real hospitals use DICOM, not PNG uploads. Adding native DICOM support would make Pneumanosis plug directly into existing radiology workflows and PACS systems.
  • Multi-model ensemble: Combine DenseNet-121 with the MobileNetV3-Small architecture already in the codebase for a lightweight model option that can run at the edge in low-resource settings.
  • Longitudinal patient tracking: Move beyond single-image analysis to track a patient's X-rays over time, automatically flagging progression or improvement.
  • Clinical pilot: Partner with a hospital or clinic to run a real-world validation study, measuring impact on turnaround time and missed-case rates — the metrics that matter most. Studies have shown AI triage can reduce turnaround time by 77% and drop missed-case rates from 44.8% to 2.6%.
  • HIPAA compliance and deployment hardening: Build out the security, audit logging, and access control layers needed for a production healthcare environment.

Built With

  • chexpert-v1.0-small
  • claude
  • co-piolet
  • css
  • discord
  • fastapi
  • gemini
  • github
  • next.js-16
  • react-19
  • tailwind
  • vscode
Share this project:

Updates