Myo AI

Myo AI: A Multimodal Fusion Framework for Cardiovascular Risk Stratification

Inspiration

Cardiovascular disease (CVD) remains the leading cause of global mortality, necessitating diagnostic tools that transcend static risk scoring. While working with fragmented medical datasets, I identified a critical limitation in current predictive models: they operate in silos, analyzing either tabular clinical vitals or raw physiological signals, but rarely both.

I realized that precision medicine requires a holistic view. Myo AI was born from the desire to bridge this gap—a multimodal fusion system that orchestrates a "Battle Royale" tournament among five distinct machine learning architectures to identify the optimal strategy for cardiac risk prediction.

What it does

Myo AI is an advanced diagnostic engine that synthesizes heterogeneous clinical data with high-frequency physiological signals.

Multimodal Fusion: Ingests and harmonizes standard clinical records (Age, BP, Cholesterol) with statistical features extracted from over 600MB of raw ECG waveforms.
Tournament Architecture: Instead of relying on a single model, it instantiates five independent pipelines—ranging from Probabilistic Naive Bayes to 1D-Convolutional Neural Networks (CNNs)—and evaluates them in a stratified "Tournament" to crown a champion.
Explainable Intelligence: The winning model powers the Oracle Layer, utilizing SHAP (SHapley Additive exPlanations) to provide clinicians with granular, patient-specific "force plots" that detail exactly why a risk prediction was made.
Time-Travel Simulation: The Chronos Engine projects a patient's risk trajectory over the next 20 years based on aging and lifestyle modifications.

How we built it

The system is engineered as a modular, three-layer stack using a robust Python data science ecosystem.

Layer 1: The Foundation (Data Engineering)

We built three specialized engines to handle data integrity:

Synapse Ingestion Engine: Harmonizes four disparate CSV sources into a unified schema, processing a massive cohort of 140,918 patient records.
Pulse Harmonization Engine: Implements a memory-efficient, chunk-based streaming algorithm to process gigabytes of ECG waveforms without crashing RAM. It extracts statistical moments to create a "physiological fingerprint" for each patient:

$$\text{Kurtosis} = \frac{\frac{1}{N}\sum (x_i - \bar{x})^4}{(\frac{1}{N}\sum (x_i - \bar{x})^2)^2}$$

Catalyst Feature Synthesizer: Fuses clinical and signal modalities while engineering critical biomarkers like Pulse Pressure and BMI, transparently handling sensor missingness.

Layer 2: The Tournament (Model Selection)

We rejected the "one-size-fits-all" heuristic. Myo AI instantiates five fully isolated Knowledge Discovery (KDD) pipelines:

Aegis Protocol (Random Forest)
Myo-Core Engine (Histogram Gradient Boosting)
Sentinel Node (Gaussian Naive Bayes)
Vanguard System (Logistic Regression)
Pulse-Sync Architecture (1D-CNN Deep Learning)

Each model competes in a stratified validation environment, ensuring objective, data-driven selection.

Layer 3: The Intelligence (Analysis & UI)

Oracle Layer: Uses SHAP TreeExplainer to visualize feature importance and directionality.
Zenith Map: A PCA + K-Means clustering module that identifies unsupervised "Risk Phenotypes" within the patient population.
Myo-Sim Bio-Deck: An interactive "Digital Twin" dashboard built with ipywidgets that allows clinicians to simulate interventions (e.g., "What if the patient stops smoking?") in real-time.

Challenges we ran into

The "Big Data" Bottleneck: Processing over 600MB of ECG time-series data was computationally expensive. We solved this by implementing a chunked streaming algorithm that processes signals in batches of 100,000 rows, reducing memory overhead by 90%.
Data Leakage in Ensembles: Training multiple models side-by-side risked data leakage. We enforced strict Architectural Independence, ensuring each pipeline had its own isolated feature selection, scaling, and imputation steps.
Deep Learning Overfitting: Our 1D-CNN initially overfitted on the tabular data. We addressed this by reshaping the input into 3D tensors (Samples, Features, 1) and implementing Dropout layers to improve generalization.

Accomplishments that we're proud of

Tournament Victory: The system successfully identified the Aegis Protocol (Random Forest) as the optimal model with a ROC-AUC of 0.8082, empirically proving that ensemble methods can outperform deep learning for structured clinical data.
Unsupervised Discovery: The Zenith Map successfully clustered patients into distinct "Low", "Moderate", and "High" risk phenotypes without ever seeing the target labels, validating the presence of genuine physiological patterns.
Interactive Simulation: Building the Myo-Sim Bio-Deck transformed a static script into a dynamic clinical tool, allowing for real-time "what-if" analysis.

What we learned

Model Selection over Complexity: While we devoted significant resources to the Pulse-Sync CNN, the simpler Random Forest won the tournament. This reinforced the lesson that for tabular data, complex deep learning architectures are not always superior to robust ensembles.
The Power of Feature Engineering: Permutation Importance analysis revealed that engineered features like Systolic Blood Pressure and Pulse Pressure were far more predictive than raw metrics alone.
Ethical AI Design: Implementing the sensor_signal_available flag taught us the importance of transparently handling missing data rather than relying on silent imputation, which can introduce bias.

What's next for Myo AI

Wearable Integration: We plan to connect the Pulse Harmonization Engine directly to API streams from Apple Watch or Fitbit for continuous, at-home monitoring.
Federated Learning: Implementing a decentralized training protocol to allow hospitals to train the Aegis Protocol on their private data without sharing sensitive patient records.
Clinical Deployment: Containerizing the winning pipeline (Layer 4 Archive) into a Dockerized REST API for seamless integration with Electronic Health Records (EHR) systems.

Built With

deep-learning
explainable-ai
gdown
google-colab
ipywidgets
joblib
jupyter-notebook
keras
machine-learning
matplotlib
numpy
pandas
python
scikit-learn
scipy
seaborn
shap
signal-processing
tensorflow

Updates

Muhammad Ahmed started this project — Feb 04, 2026 02:35 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.