SPN-LSM for Chest X-ray Interpretability

Overview

This project explores interpretable deep learning for chest X-ray classification by re-implementing and extending the SPN-LSM framework. Our goal is to bridge the gap between predictive performance and interpretability by generating counterfactual explanations in a structured latent space. The system combines deep generative modeling with probabilistic reasoning to provide insight into how model predictions can change under realistic variations of the input.

The Problem

Deep learning models, particularly convolutional neural networks, have achieved strong performance on medical imaging tasks such as chest X-ray classification. However, their predictions are often opaque, making it difficult for practitioners to understand or trust model decisions in clinical settings. Existing interpretability methods frequently operate directly on pixel space, which can lead to unrealistic or hard-to-interpret explanations. This project addresses the challenge of generating meaningful and plausible counterfactual explanations by operating in a learned latent space instead.

Training

To build the system, we re-implemented the SPN-LSM pipeline in PyTorch and trained its components on a chest X-ray dataset. A Variational Autoencoder (VAE) was trained to learn compact latent representations of input images, balancing reconstruction quality and regularization to ensure stable convergence. On top of the learned latent space, a Sum-Product Network (SPN) was trained to model class-conditional distributions, enabling probabilistic reasoning over latent variables.

Evaluation

We evaluated the system at both the component and pipeline levels. The VAE was assessed through reconstruction behavior and training stability, while the SPN was validated for consistent probabilistic inference over latent embeddings. For end-to-end evaluation, we generated counterfactual images via SPN-guided latent optimization and assessed their quality using quantitative metrics such as the Frechet Inception Distance (FID), as well as qualitative inspection. Although numerical results did not exactly match those reported in the original paper, the system exhibited qualitatively similar behavior.