Inspiration

Healthcare is still mostly reactive. You feel symptoms, you go to a doctor, they run tests. By the time something shows up on a report, the problem has often been developing for weeks or months.I kept thinking about this after reading that nearly 80% of strokes and cardiac events have detectable precursors — changes in heart rhythm, signal patterns, variability — that show up well before the actual event. The data exists. The signals are there. But nobody is watching them continuously, and most people do not have access to the kind of monitoring that would catch these patterns early.That gap felt like exactly the kind of problem worth working on. Not building another wellness app, but something that could actually sit between a person and a serious health event and say: something looks different here, pay attention.

What it does

Health Guardian is an AI-assisted health monitoring system that processes bio signal data — primarily ECG — to detect anomalies and flag early indicators of cardiovascular risk, including patterns associated with stroke precursors and arrhythmia. It is not a diagnostic tool. It is a monitoring layer. The goal is to surface signals that warrant clinical attention before they become emergencies. Core features: ECG signal ingestion and preprocessing, accepts raw bio signal input, cleans noise, and segments into analysable windows Anomaly detection pipeline, identifies deviations from a baseline using a trained ML model on labelled cardiac data Risk scoring, assigns a risk level per reading window (low / moderate / elevated) based on detected pattern features Alert generation, flags readings that cross defined thresholds for follow-up review Visualisation dashboard — displays the processed signal, detected anomalies, and risk trend over time in a readable interface Report summary, generates a plain-language summary of findings that a non-specialist can understand The system is designed so that a patient or caregiver can feed it continuous or periodic ECG data and receive an ongoing picture of cardiac health, not just a snapshot from one clinic visit.

How we built it

Signal processing: Raw ECG data comes in noisy. The first step was building a preprocessing pipeline using NeuroKit2 and SciPy — applying bandpass filters to remove baseline wander and high-frequency noise, then using Pan-Tompkins peak detection to identify R-peaks and segment individual heartbeats into fixed-length windows. Feature extraction: From each segmented beat and inter-beat interval, we extracted time-domain and frequency-domain HRV (heart rate variability) features — SDNN, RMSSD, LF/HF ratio — alongside morphological features from the P, QRS, and T wave components. These became the input to the model. Model: We trained a binary and multi-class classifier using a combination of a lightweight CNN for raw waveform classification and a gradient-boosted tree model (XGBoost) on the extracted HRV features. The CNN handles morphological pattern recognition; the XGBoost model handles the statistical rhythm features. Outputs from both are combined through a simple ensemble layer. Training data came from the PhysioNet MIT-BIH Arrhythmia Database and PTB Diagnostic ECG Database — both publicly available, labelled datasets with clinical annotations. Backend and pipeline: Python throughout. Pandas and NumPy for data handling, TensorFlow/Keras for the CNN, XGBoost for the tree model, Matplotlib and Plotly for visualisation. A Flask API serves the model and handles data submission from the frontend. Frontend: A simple dashboard built with React that lets users upload ECG recordings, view the processed signal with anomaly markers, and read the risk summary. GitLab: We used GitLab for version control and ran a basic CI pipeline that automatically runs the preprocessing tests and model evaluation script on every push to main. It was genuinely useful — caught two instances where a preprocessing change broke the feature extraction output silently.

Challenges we ran into

Data quality was harder than expected. Real-world ECG data is messier than the cleaned benchmark datasets. Even within PhysioNet, some recordings had labelling inconsistencies that affected training. We spent more time on data validation than we initially planned. Class imbalance. Normal sinus rhythm vastly outnumbers anomalous readings in any real dataset. Getting the model to not just predict "normal" for everything required careful oversampling with SMOTE and threshold tuning. We are still not fully happy with the recall on rare arrhythmia classes. Defining "risk" without being a clinician. The line between "flag this for review" and "this is a diagnosis" is important and genuinely difficult to get right. We spent a lot of time making sure the output language is careful — the system surfaces patterns, it does not tell anyone they are sick. Time. A hackathon is not enough time to do this problem full justice. The prototype works, the pipeline is real, but there are rough edges we would clean up with more runway.

Accomplishments that we're proud of

Getting the end-to-end pipeline working — from raw ECG input to a risk summary that a non-specialist can read — within the hackathon window is something we are genuinely pleased with. The ensemble approach outperformed either model alone on the validation set. The CNN on its own hit about 87% accuracy on arrhythmia classification. The XGBoost on HRV features alone hit around 83%. Combined, we reached 91% on the held-out test split from MIT-BIH. That was a real result, not a tuned number. The GitLab CI pipeline catching silent regressions twice during development was a small thing but it validated the decision to set it up properly from the start. And honestly — building something in the healthcare space that takes its responsibility seriously. The framing, the language, the deliberate choice not to overclaim. That felt important to get right.

What we learned

Technical: HRV feature engineering is genuinely deep. We only scratched the surface of what is possible with frequency-domain analysis. The Pan-Tompkins algorithm is old but still robust — there is a reason it is still widely used. Ensemble methods on heterogeneous feature types work well when the individual models capture different signal aspects. About the problem: Talking to the data made us more careful. The closer we got to real clinical labels, the more we understood how much context a clinician brings to a reading that a model cannot easily replicate. That humility changed how we framed the output. About building things fast: Scoping ruthlessly matters. We cut three features in the first four hours because they would have taken time away from making the core pipeline solid. The working demo is better for those cuts.

What's next for Health Guardian

Better models. We want to train on larger, more diverse datasets — including wearable-grade ECG data from devices like the Apple Watch or AliveCor KardiaMobile, which are what most people actually have access to. Real-time streaming. The current pipeline handles uploaded recordings. The next version should handle a continuous data stream from a wearable and flag anomalies as they happen. Stroke-specific features. Atrial fibrillation is one of the strongest modifiable risk factors for stroke. We want to build a dedicated AF detection module with validated clinical sensitivity thresholds. Clinical validation. The model works on benchmark data. Whether it generalises to a real patient population requires a proper clinical study. That is the honest next step before this goes anywhere near a real user. Integration with existing health records. Connecting detected patterns to longitudinal health data — not just a single ECG session — would make the risk picture far more meaningful.

Built With

Share this project:

Updates