CardioScan: ECG Digitization with ResNet-UNet

Inspiration

Even nowadays, many hospitals still rely on fax machines to share ECG records. Billions of electrocardiograms exist only as paper printouts or scanned images — records that cannot be searched, compared, or fed into modern diagnostic software. Studies have shown that most faxed ECGs do not meet diagnostic quality standards, and many fail to convey basic parameters such as heart rate or electrical axis.

This inspired us to build CardioScan — a tool that brings ECG data back into a usable digital form.

What it does

CardioScan is a web-based tool that converts scanned or photographed ECG images into structured 12-lead digital signals. Users upload an ECG image and receive calibrated waveforms that can be analyzed, shared, and exported — turning static paper records into computable data.

How we built it

CardioScan runs three neural networks in sequence, each solving a distinct subproblem.

Stage 0 uses a ResNet-18d + UNet to detect printed lead-name labels and classify image rotation. The detected keypoints anchor a homography transform that coarsely aligns the image to a standard coordinate system.

Stage 1 uses a ResNet-34 + UNet to reconstruct the ECG paper grid. Each pixel is assigned to a specific horizontal or vertical line index — not just foreground/background — so the full 44×57 intersection map can be read off directly and used for non-linear rectification of paper curvature and lens distortion.

Stage 2 extracts the waveforms using a coordinate-aware UNet decoder, where normalized x/y position maps are injected at each decoder stage (CoordConv). This gives the network spatial context to distinguish signals in different rows. Output probability maps are converted to millivolt time series using the physical calibration constants of the ECG grid.

Because real-world ECG images vary dramatically by scanner and lighting conditions, we also trained an EfficientNet-B5 source classifier across 12 visual categories. Each image is routed to a tailored preprocessing branch — CLAHE, white balance, bilateral denoising, or morphological background correction — before entering the main pipeline.

All models are trained and deployed by us. No patient data is sent to external services.

Training used the PhysioNet ECG Image Digitization dataset: real scanned 12-lead ECGs paired with ground-truth digital signals, covering diverse machines and imaging conditions from clinical settings.

Challenges we ran into

Waveform row confusion. Standard UNets are translation-invariant and cannot distinguish signals in different image rows, causing cross-lead mixing. We injected normalized coordinate maps into each decoder block (CoordConv), giving the network position-dependent context.

Peak and trough extraction errors. Naively taking the highest-probability pixel per column distorts amplitude for broad or low-amplitude deflections. We implemented a bidirectional scan — top-down above the baseline, bottom-up below it — followed by Savitzky-Golay smoothing and Einthoven triangle correction.

Visual diversity of real-world images. A single preprocessing pipeline degrades on out-of-distribution inputs. Our source classifier routes each image to a dedicated preprocessing branch, acting as a learned domain adaptation layer without separate model weights per source.

Accomplishments that we're proud of

We built a complete, deployable end-to-end system — three networks, a source classifier, and a web interface — that works on realistic, noisy ECG images and produces standardized digital signals, with all inference running locally to protect patient privacy.

What we learned

Building CardioScan taught us that domain gap is the real enemy in medical imaging — preprocessing decisions that seem trivial in a clean dataset become critical when inputs come from phone cameras, aging scanners, and uneven lighting. We also learned that translation-invariant convolutions have a fundamental blind spot for spatially structured outputs, and that most of the hard engineering work happens at the seams: coordinate system mismatches, GPU memory fragmentation, and latency constraints that don't show up until you build the full serving layer.

What's next for CardioScan

Three directions: adding uncertainty estimation so the system flags low-confidence outputs rather than returning noisy signals silently; extending preprocessing to detect and correct multiple degradation factors in parallel rather than one per image; and deploying on a stable cloud GPU service with EHR integration, exporting signals in standard clinical formats such as HL7 aECG — so that decades of paper ECG archives become truly computable and clinically useful.