🫀 EYE HEART CONNECTION — Hackathon Story

What if a simple eye exam could reveal the state of your heart?


Inspiration

The idea for EYE HEART CONNECTION was born from a striking insight in medical research: the retina is the only place in the human body where microvasculature can be directly observed non-invasively. Studies by Cheung et al. (2012), Poplin et al. (2018), and Rim et al. (2020) have demonstrated that retinal fundus photographs contain telltale signs of systemic cardiovascular disease — hypertensive retinopathy, diabetic microaneurysms, arteriolar narrowing, and atherosclerotic changes.

Despite these findings, the clinical workflow remains disconnected. Ophthalmologists look at eyes. Cardiologists look at hearts. There is no accessible, automated bridge between the two.

We were inspired to build that bridge. Our goal was to create an end-to-end system where anyone — a doctor in a rural clinic, a researcher, or even a curious individual — can upload a pair of retinal fundus photographs, enter their age, and receive an instant, explainable cardiovascular risk assessment powered by deep learning.


What It Does

EYE HEART CONNECTION is a full-stack, multimodal AI system that:

  1. Accepts bilateral retinal fundus images (left and right eye) along with the patient's age through an interactive web interface.

  2. Runs a shared image encoder (EfficientNet-B4) on both images to extract rich visual features from the retinal microvasculature.

  3. Fuses image features with normalized age metadata through a dedicated metadata MLP branch and a multi-layer fusion head.

  4. Predicts 8 ophthalmic conditions — Normal (N), Diabetes (D), Glaucoma (G), Cataract (C), Age-related Macular Degeneration (A), Hypertension (H), Myopia (M), and Other (O) — as a multilabel classification.

  5. Converts predicted probabilities into cardiovascular risk proxy scores using clinically-informed weighting:

$$ \text{hypertension_proxy} = \frac{0.7 \cdot P(H) + 0.2 \cdot P(D) + 0.1 \cdot P(A)}{0.7 + 0.2 + 0.1} $$

$$ \text{diabetes_proxy} = \frac{0.8 \cdot P(D) + 0.2 \cdot P(H)}{0.8 + 0.2} $$

$$ \text{atherosclerotic_proxy} = \frac{0.6 \cdot P(A) + 0.2 \cdot P(H) + 0.2 \cdot P(D)}{0.6 + 0.2 + 0.2} $$

  1. Aggregates these into an overall cardiovascular proxy score:

$$ \text{overall_cv_proxy} = 0.4 \cdot \text{hypertension_proxy} + 0.35 \cdot \text{diabetes_proxy} + 0.25 \cdot \text{atherosclerotic_proxy} $$

  1. Assigns a risk bandLow ($\leq 0.33$), Medium ($0.34$–$0.66$), or High ($> 0.66$).

  2. Presents everything through a polished Reflex web UI with interactive bar charts, clinical explanations, and sample case loading.


How We Built It

1. Data Pipeline

We built a patient-level data module that:

  • Reads the Ocular Disease Intelligent Recognition (ODIR) dataset with thousands of fundus images
  • Constructs bilateral patient records pairing left and right eye images
  • Generates stratified train/val/test splits at the patient level (to prevent data leakage between eyes of the same patient)
  • Computes dataset-wide metadata statistics (age mean $\mu$ and standard deviation $\sigma$) for z-score normalization:

2. Model Architecture

We designed a multimodal fusion model (MultimodalRiskModel) with three parallel branches:

Image Encoder Branch (shared weights):

  • Uses EfficientNet-B4 pretrained on ImageNet as the backbone
  • The final classifier head is replaced with nn.Identity() to extract a 1792-dimensional feature vector
  • The same encoder processes both left and right fundus images, enabling bilateral feature learning

Metadata Branch:

  • A 2-layer MLP that takes normalized age as input:

  • Outputs a 32-dimensional embedding

Fusion Head:

  • Concatenates all three embeddings .
  • Passes through a 3-layer MLP .
  • Outputs 8 raw logits, one per ophthalmic condition
  • Sigmoid activation converts logits to independent probabilities

3. Training Strategy

We implemented a gradual unfreezing policy:

  • Epochs 1–5: The image encoder is completely frozen; only the metadata branch and fusion head train. This lets the randomly-initialized layers learn meaningful representations without destructive gradients from the large backbone.
  • Epoch 5 onwards: The entire network is unfrozen for end-to-end fine-tuning.

Loss function: Binary Cross-Entropy with Logits (BCEWithLogitsLoss) for multilabel classification.

4. API Layer

We built a production-ready FastAPI backend (api/main.py) that:

  • Loads the trained checkpoint on startup
  • Exposes POST /predict accepting multipart form data (left image, right image, age)
  • Returns structured JSON with labels, probabilities, and full CV summary
  • Includes /health and /docs endpoints

5. Frontend

We built the interactive UI using Reflex (a Python-based web framework) with:

  • Drag-and-drop bilateral image upload panels with live preview
  • An age slider (1–120) and numeric input
  • A sample case dropdown that auto-discovers paired images from assets/sample_cases/
  • A 3-step flow: Input → Processing (with spinner animation) → Results
  • Recharts bar chart showing disease probabilities sorted by severity
  • Color-coded risk band badge (green/orange/red)
  • Clinical explanation text generated from the top predicted conditions
  • A left sidebar with project info, ophthalmic indicator descriptions, a medical glossary, and research references
  • A polished, medical-themed design with green gradients, glass-morphism cards, and smooth animations

6. Deployment

We containerized everything with Docker — a single Dockerfile that:

  • Installs all dependencies including Reflex
  • Copies model artifacts, configs, API, inference, models, and frontend code
  • Uses a startup script that launches both FastAPI (port 8000) and Reflex (port 7860) in a single container
  • Supports deployment to Hugging Face Spaces with dedicated config files

Challenges We Ran Into

🔴 Bilateral Data Alignment

The ODIR dataset labels diseases at the patient level, but images exist per-eye. We had to carefully construct a patient-level dataframe that correctly pairs left and right fundus images while preserving per-eye annotations, then aggregate them into patient-level multilabel targets without introducing data leakage.

🔴 Class Imbalance

The 8 ophthalmic conditions are heavily imbalanced — "Normal" and "Diabetes" dominate, while "Glaucoma" and "Other" are rare. Training with standard BCE loss led to poor minority-class recall. We had to experiment with threshold tuning and evaluation metrics that account for class imbalance.

🔴 Single-Port Deployment for Reflex

Reflex typically uses separate ports for its frontend and backend. For containerized deployment (Docker, Hugging Face Spaces), we needed everything on a single port. We solved this using Reflex's --single-port flag and carefully routing FastAPI through a subprocess managed by our startup script.

🔴 Cross-Platform Compatibility

The project needed to work on Windows (development), Linux (Docker/Spaces), and across different PyTorch versions. We handled edge cases like torch.load's weights_only parameter changing defaults in PyTorch ≥2.6, and path handling differences between operating systems.

🔴 Explaining AI Predictions Clinically

Raw probabilities are meaningless to end users. We designed a clinically-informed cardiovascular proxy scoring system with literature-backed weights and threshold-based risk bands. Translating model outputs into human-readable explanations ("You have a 64.0% predicted risk of cardiovascular disease") required careful UX iteration.


Accomplishments That We're Proud Of

  • End-to-end working system — from raw data preparation to a polished, deployed web app with real predictions
  • Shared bilateral encoder — the same EfficientNet-B4 processes both eyes, enabling the model to learn cross-eye retinal patterns that a single-image approach would miss
  • Clinically grounded CV proxy — our weighted scoring system is informed by established medical literature linking ophthalmic findings to cardiovascular risk
  • Production-grade engineering — FastAPI with Pydantic schemas, Docker containerization, YAML-driven configuration, comprehensive pytest test suite, and clean module separation
  • Beautiful, accessible UI — a medical-themed Reflex interface with glass-morphism cards, animated transitions, interactive charts, and a clinical glossary sidebar that educates users while they wait
  • Real sample cases — users can instantly try the system with pre-loaded fundus image pairs, no data upload required
  • Deployed and publicly accessible on Hugging Face Spaces for anyone to try

What We Learned

  1. Retinal imaging is a goldmine for systemic health prediction. The blood vessels visible in a fundus photograph are direct proxies for the body's broader vascular health. This project reinforced our appreciation for ophthalmology as a window into cardiovascular medicine.

  2. Multimodal fusion is powerful but tricky. Combining image features (3616D) with a tiny metadata signal (1D age) required careful normalization and architecture design. The metadata branch needed its own MLP to "amplify" the age signal to a comparable representational scale.

  3. Deployment is half the battle. Building a model in a notebook is one thing. Serving it behind a REST API, wrapping it in a container, and making it accessible through a polished web UI requires a completely different engineering skillset. We learned to navigate Docker networking, Reflex's single-port mode, and Hugging Face Spaces' build system.

  4. Explainability matters more than accuracy. A 90% accurate model that outputs raw logits is less useful than an 80% accurate model that tells a doctor "This patient has a 67% hypertension proxy score driven primarily by retinal vessel narrowing signals." Designing the CV proxy system taught us that how you communicate predictions is as important as the predictions themselves.

  5. Gradual unfreezing stabilizes training. Jumping straight into end-to-end fine-tuning of a massive EfficientNet backbone with a randomly-initialized fusion head led to unstable gradients. Freezing the encoder for the first few epochs allowed the fusion layers to stabilize first.


What's Next for EYE HEART CONNECTION

  • Temporal modeling — Incorporate longitudinal fundus image sequences to track cardiovascular risk changes over time
  • Additional metadata — Include blood pressure, BMI, and smoking history alongside age for richer risk estimation
  • GradCAM and attention visualization — Show users which regions of their retina the model focused on, making predictions even more interpretable
  • Clinical validation study — Partner with ophthalmologists and cardiologists to validate the CV proxy scores against actual cardiovascular outcomes
  • Multi-dataset training — Train on additional retinal datasets (e.g., EyePACS, Messidor) to improve generalization across demographics and imaging equipment
  • Mobile-friendly interface — Optimize the Reflex UI for mobile devices to enable point-of-care use in clinics

Built with ❤️ by Ayush Saini

Built With

Share this project:

Updates