logo

🫀 EYE HEART CONNECTION — Hackathon Story

What if a simple eye exam could reveal the state of your heart?

Inspiration

The idea for EYE HEART CONNECTION was born from a striking insight in medical research: the retina is the only place in the human body where microvasculature can be directly observed non-invasively. Studies by Cheung et al. (2012), Poplin et al. (2018), and Rim et al. (2020) have demonstrated that retinal fundus photographs contain telltale signs of systemic cardiovascular disease — hypertensive retinopathy, diabetic microaneurysms, arteriolar narrowing, and atherosclerotic changes.

Despite these findings, the clinical workflow remains disconnected. Ophthalmologists look at eyes. Cardiologists look at hearts. There is no accessible, automated bridge between the two.

We were inspired to build that bridge. Our goal was to create an end-to-end system where anyone — a doctor in a rural clinic, a researcher, or even a curious individual — can upload a pair of retinal fundus photographs, enter their age, and receive an instant, explainable cardiovascular risk assessment powered by deep learning.

What It Does

EYE HEART CONNECTION is a full-stack, multimodal AI system that:

Accepts bilateral retinal fundus images (left and right eye) along with the patient's age through an interactive web interface.
Runs a shared image encoder (EfficientNet-B4) on both images to extract rich visual features from the retinal microvasculature.
Fuses image features with normalized age metadata through a dedicated metadata MLP branch and a multi-layer fusion head.
Predicts 8 ophthalmic conditions — Normal (N), Diabetes (D), Glaucoma (G), Cataract (C), Age-related Macular Degeneration (A), Hypertension (H), Myopia (M), and Other (O) — as a multilabel classification.
Converts predicted probabilities into cardiovascular risk proxy scores using clinically-informed weighting:

$$ \text{hypertension_proxy} = \frac{0.7 \cdot P(H) + 0.2 \cdot P(D) + 0.1 \cdot P(A)}{0.7 + 0.2 + 0.1} $$

$$ \text{diabetes_proxy} = \frac{0.8 \cdot P(D) + 0.2 \cdot P(H)}{0.8 + 0.2} $$

$$ \text{atherosclerotic_proxy} = \frac{0.6 \cdot P(A) + 0.2 \cdot P(H) + 0.2 \cdot P(D)}{0.6 + 0.2 + 0.2} $$

Aggregates these into an overall cardiovascular proxy score:

$$ \text{overall_cv_proxy} = 0.4 \cdot \text{hypertension_proxy} + 0.35 \cdot \text{diabetes_proxy} + 0.25 \cdot \text{atherosclerotic_proxy} $$

Assigns a risk band — Low ($\leq 0.33$), Medium ($0.34$–$0.66$), or High ($> 0.66$).
Presents everything through a polished Reflex web UI with interactive bar charts, clinical explanations, and sample case loading.

How We Built It

1. Data Pipeline

We built a patient-level data module that:

Reads the Ocular Disease Intelligent Recognition (ODIR) dataset with thousands of fundus images
Constructs bilateral patient records pairing left and right eye images
Generates stratified train/val/test splits at the patient level (to prevent data leakage between eyes of the same patient)
Computes dataset-wide metadata statistics (age mean $\mu$ and standard deviation $\sigma$) for z-score normalization:

2. Model Architecture

We designed a multimodal fusion model (MultimodalRiskModel) with three parallel branches:

Image Encoder Branch (shared weights):

Uses EfficientNet-B4 pretrained on ImageNet as the backbone
The final classifier head is replaced with nn.Identity() to extract a 1792-dimensional feature vector
The same encoder processes both left and right fundus images, enabling bilateral feature learning

Metadata Branch:

A 2-layer MLP that takes normalized age as input:
Outputs a 32-dimensional embedding

Fusion Head:

Concatenates all three embeddings .
Passes through a 3-layer MLP .
Outputs 8 raw logits, one per ophthalmic condition
Sigmoid activation converts logits to independent probabilities

3. Training Strategy

We implemented a gradual unfreezing policy:

Epochs 1–5: The image encoder is completely frozen; only the metadata branch and fusion head train. This lets the randomly-initialized layers learn meaningful representations without destructive gradients from the large backbone.
Epoch 5 onwards: The entire network is unfrozen for end-to-end fine-tuning.

Loss function: Binary Cross-Entropy with Logits (BCEWithLogitsLoss) for multilabel classification.

4. API Layer

We built a production-ready FastAPI backend (api/main.py) that:

Loads the trained checkpoint on startup
Exposes POST /predict accepting multipart form data (left image, right image, age)
Returns structured JSON with labels, probabilities, and full CV summary
Includes /health and /docs endpoints

5. Frontend

We built the interactive UI using Reflex (a Python-based web framework) with:

Drag-and-drop bilateral image upload panels with live preview
An age slider (1–120) and numeric input
A sample case dropdown that auto-discovers paired images from assets/sample_cases/
A 3-step flow: Input → Processing (with spinner animation) → Results
Recharts bar chart showing disease probabilities sorted by severity
Color-coded risk band badge (green/orange/red)
Clinical explanation text generated from the top predicted conditions
A left sidebar with project info, ophthalmic indicator descriptions, a medical glossary, and research references
A polished, medical-themed design with green gradients, glass-morphism cards, and smooth animations

6. Deployment

We containerized everything with Docker — a single Dockerfile that:

Installs all dependencies including Reflex
Copies model artifacts, configs, API, inference, models, and frontend code
Uses a startup script that launches both FastAPI (port 8000) and Reflex (port 7860) in a single container
Supports deployment to Hugging Face Spaces with dedicated config files

Challenges We Ran Into

🔴 Bilateral Data Alignment

The ODIR dataset labels diseases at the patient level, but images exist per-eye. We had to carefully construct a patient-level dataframe that correctly pairs left and right fundus images while preserving per-eye annotations, then aggregate them into patient-level multilabel targets without introducing data leakage.

🔴 Class Imbalance

The 8 ophthalmic conditions are heavily imbalanced — "Normal" and "Diabetes" dominate, while "Glaucoma" and "Other" are rare. Training with standard BCE loss led to poor minority-class recall. We had to experiment with threshold tuning and evaluation metrics that account for class imbalance.

🔴 Single-Port Deployment for Reflex

Reflex typically uses separate ports for its frontend and backend. For containerized deployment (Docker, Hugging Face Spaces), we needed everything on a single port. We solved this using Reflex's --single-port flag and carefully routing FastAPI through a subprocess managed by our startup script.

🔴 Cross-Platform Compatibility

The project needed to work on Windows (development), Linux (Docker/Spaces), and across different PyTorch versions. We handled edge cases like torch.load's weights_only parameter changing defaults in PyTorch ≥2.6, and path handling differences between operating systems.

🔴 Explaining AI Predictions Clinically

Raw probabilities are meaningless to end users. We designed a clinically-informed cardiovascular proxy scoring system with literature-backed weights and threshold-based risk bands. Translating model outputs into human-readable explanations ("You have a 64.0% predicted risk of cardiovascular disease") required careful UX iteration.

Accomplishments That We're Proud Of

End-to-end working system — from raw data preparation to a polished, deployed web app with real predictions
Shared bilateral encoder — the same EfficientNet-B4 processes both eyes, enabling the model to learn cross-eye retinal patterns that a single-image approach would miss
Clinically grounded CV proxy — our weighted scoring system is informed by established medical literature linking ophthalmic findings to cardiovascular risk
Production-grade engineering — FastAPI with Pydantic schemas, Docker containerization, YAML-driven configuration, comprehensive pytest test suite, and clean module separation
Beautiful, accessible UI — a medical-themed Reflex interface with glass-morphism cards, animated transitions, interactive charts, and a clinical glossary sidebar that educates users while they wait
Real sample cases — users can instantly try the system with pre-loaded fundus image pairs, no data upload required
Deployed and publicly accessible on Hugging Face Spaces for anyone to try

What We Learned

Retinal imaging is a goldmine for systemic health prediction. The blood vessels visible in a fundus photograph are direct proxies for the body's broader vascular health. This project reinforced our appreciation for ophthalmology as a window into cardiovascular medicine.
Multimodal fusion is powerful but tricky. Combining image features (3616D) with a tiny metadata signal (1D age) required careful normalization and architecture design. The metadata branch needed its own MLP to "amplify" the age signal to a comparable representational scale.
Deployment is half the battle. Building a model in a notebook is one thing. Serving it behind a REST API, wrapping it in a container, and making it accessible through a polished web UI requires a completely different engineering skillset. We learned to navigate Docker networking, Reflex's single-port mode, and Hugging Face Spaces' build system.
Explainability matters more than accuracy. A 90% accurate model that outputs raw logits is less useful than an 80% accurate model that tells a doctor "This patient has a 67% hypertension proxy score driven primarily by retinal vessel narrowing signals." Designing the CV proxy system taught us that how you communicate predictions is as important as the predictions themselves.
Gradual unfreezing stabilizes training. Jumping straight into end-to-end fine-tuning of a massive EfficientNet backbone with a randomly-initialized fusion head led to unstable gradients. Freezing the encoder for the first few epochs allowed the fusion layers to stabilize first.

What's Next for EYE HEART CONNECTION

Temporal modeling — Incorporate longitudinal fundus image sequences to track cardiovascular risk changes over time
Additional metadata — Include blood pressure, BMI, and smoking history alongside age for richer risk estimation
GradCAM and attention visualization — Show users which regions of their retina the model focused on, making predictions even more interpretable
Clinical validation study — Partner with ophthalmologists and cardiologists to validate the CV proxy scores against actual cardiovascular outcomes
Multi-dataset training — Train on additional retinal datasets (e.g., EyePACS, Messidor) to improve generalization across demographics and imaging equipment
Mobile-friendly interface — Optimize the Reflex UI for mobile devices to enable point-of-care use in clinics

Built with ❤️ by Ayush Saini

Built With

albumentations
docker
efficientnet-b4
fastapi
git
github
httpx
hugging-face-spaces
matplotlib
numpy
pillow
pytest
python
pytorch
pyyaml
recharts
reflex
tensorboard
torchvision

Updates

Ayush Saini started this project — Apr 18, 2026 08:48 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.