Inspiration Every year, 805,000 Americans suffer heart attacks, and 12% die before reaching the hospital. The most critical factor? Time. For every 30-minute delay in treatment, mortality increases by 7.5%. Yet, emergency departments struggle with false STEMI activations, wasting precious resources and delaying care for real emergencies.
In 2025, groundbreaking research showed that the Queen of Hearts AI model achieved 92% accuracy in STEMI detection with a 5x reduction in false positives. This inspired us to ask: What if we could bring this intelligence to the point of care through autonomous AI agents?
QueenPulseOMIGuard (Q-POG) was born from this vision—a multi-agent system that acts as a "virtual cardiologist team," orchestrating ECG analysis in under 10 seconds to save lives.
What it does Q-POG is an agentic orchestration system that analyzes 12-lead ECG data to detect life-threatening cardiac conditions:
Core Capabilities
- STEMI Detection: Identifies ST-elevation myocardial infarction with 92% accuracy
- Arrhythmia Detection: Recognizes irregular heart rhythms (e.g., atrial fibrillation)
- Real-Time Processing: Complete analysis in <5 seconds (target: <10s)
- Explainable AI: Provides clinical reasoning for every diagnosis
- Visual Reports: Generates 12-lead ECG plots with ST-elevation highlighting
Multi-Agent Architecture Q-POG employs 4 specialized AI agents working in concert:
Ingestion Agent 📥
- Validates and preprocesses ECG data
- Applies bandpass filtering $(0.5-40 \text{ Hz})$ to remove noise
- Normalizes signal amplitude: $x_{\text{norm}} = \frac{x - \mu}{\sigma}$
Analysis Agent 🔬
- Detects ST-elevation using clinical criteria: $\text{ST elevation} \geq 2\text{mm}$ in $\geq 2$ contiguous leads
- Extracts features: heart rate, QRS duration, QT interval
- Calculates corrected QT: $\text{QTc} = \frac{\text{QT}}{\sqrt{\text{RR}}}$ (Bazett's formula)
Diagnosis Agent 🏥
- Determines alert level (CRITICAL/HIGH/MODERATE/LOW/NORMAL)
- Generates clinical reports with recommended actions
- Creates visualizations with affected lead highlighting
Orchestrator Agent 🎭
- Coordinates workflow across all agents
- Tracks execution state and logs
- Ensures <10 second processing time constraint
Alert System Q-POG uses a 5-level severity classification:
| Alert Level | Confidence | Action | Example |
|---|---|---|---|
| 🔴 CRITICAL | >80% STEMI | Call 911 immediately | Anterior STEMI |
| 🟠 HIGH | 60-80% abnormal | Urgent consultation (1hr) | Atrial fibrillation |
| 🟡 MODERATE | 40-60% abnormal | Follow-up (24hr) | Minor arrhythmia |
| 🟢 LOW | 20-40% abnormal | Routine follow-up | Borderline findings |
| ⚪ NORMAL | <20% abnormal | No action needed | Healthy ECG |
How I built it Tech Stack Backend: Python 3.13, FastAPI AI/ML: NumPy, SciPy (signal processing), rule-based STEMI detection Visualization: Matplotlib (12-lead ECG plots) Testing: Pytest, Hypothesis (property-based testing) Type Safety: mypy (100% type coverage) Code Quality: Black formatter, flake8 linter
I used Kiro IDE's spec-driven workflow** to systematically build Q-POG:
Requirements Phase 📋
- Defined 15 user stories with acceptance criteria
- Established non-functional requirements (NFR):
- Processing time: $t < 10
- Accuracy: STEMI detection > 80
- Confidence bounds: $c in [0, 1]
Design Phase 🎨
- Created multi-agent architecture with clear separation of concerns
- Defined 8 correctness properties for property-based testing
- Designed data schemas with validation
Implementation Phase 💻
- Built 4 agents with 2,000+ lines of production code
- Implemented signal processing pipeline:
- Bandpass filter: $H(f) = \begin{cases} 1 & 0.5 \leq f \leq 40 \ 0 & \text{otherwise} \end{cases}$
- Baseline wander removal
- R-peak detection for heart rate: $\text{HR} = \frac{60}{\text{mean}(\text{RR intervals})}$
Testing Phase ✅
- 82 tests (74 unit/integration + 8 property-based)
- Property-based testing validates:
- P1: $\forall \text{ ECG}, \text{ processing time} < 10\text{s}$
- P2: $\forall \text{ diagnosis}, 0 \leq \text{confidence} \leq 1$
- P3: $\text{STEMI} \land c > 0.8 \Rightarrow \text{alert} \in {\text{CRITICAL, HIGH}}$
- P4: $\text{correlation}(\text{original}, \text{preprocessed}) > 0.7$
- P5: $f(x) = f(x)$ (deterministic/reproducible)Here's a cleaned-up, more polished and consistent version of your markdown:
Project Phases Overview
Requirements Phase 📋
- Defined 15 user stories complete with acceptance criteria
- Established key non-functional requirements (NFRs):
- Processing time: t < 10 s
- STEMI detection accuracy: > 80%
- Confidence score: c ∈ [0, 1]*
- Processing time: t < 10 s
- Defined 15 user stories complete with acceptance criteria
Design Phase 🎨
- Designed multi-agent architecture with clear separation of concerns
- Defined 8 correctness properties for property-based testing
- Created robust data schemas with full validation rules
- Designed multi-agent architecture with clear separation of concerns
Implementation Phase 💻
- Developed 4 agents totaling 2,000+ lines of production-grade code
- Built complete signal processing pipeline including:
- Bandpass filter:
H(f) = { 1 if 0.5 Hz ≤ f ≤ 40 Hz
0 otherwise }
text- Baseline wander removal
- Bandpass filter:
H(f) = { 1 if 0.5 Hz ≤ f ≤ 40 Hz
0 otherwise }
text- Baseline wander removal
- Developed 4 agents totaling 2,000+ lines of production-grade code
R-peak detection → Heart rate calculation:
HR = 60 / mean(RR intervals)Testing Phase ✅
Total: 82 tests
74 unit & integration tests
8 property-based tests
Key properties validated:
P1 — ∀ ECG: processing time < 10 s
P2 — ∀ diagnosis: 0 ≤ confidence ≤ 1
P3 — STEMI ∧ c > 0.8 ⇒ alert ∈ {CRITICAL, HIGH}
P4 — correlation(original, preprocessed) > 0.7
P5 — f(x) = f(x) (deterministic & reproducible results) This version is: More consistent in formatting Better visual hierarchy Cleaner math notation Slightly more professional tone while keeping it readable Better spacing and alignment
Project Phases Overview
Requirements Phase 📋
- Defined 15 user stories complete with acceptance criteria
- Established key non-functional requirements (NFRs):
- Processing time: t < 10 s
- STEMI detection accuracy: > 80%
- Confidence score: c ∈ [0, 1]
- Processing time: t < 10 s
- Defined 15 user stories complete with acceptance criteria
Design Phase 🎨
- Designed multi-agent architecture with clear separation of concerns
- Defined 8 correctness properties for property-based testing
- Created robust data schemas with full validation rules
- Designed multi-agent architecture with clear separation of concerns
Implementation Phase 💻
- Developed 4 agents totaling 2,000+ lines of production-grade code
- Built complete signal processing pipeline including:
- Bandpass filter:
- Developed 4 agents totaling 2,000+ lines of production-grade code
Testing Phase ✅
Total: 82 tests
74 unit & integration tests
8 property-based tests
Key properties validated:
P1 — ∀ ECG: processing time < 10 s
P2 — ∀ diagnosis: 0 ≤ confidence ≤ 1
P3 — STEMI ∧ c > 0.8 ⇒ alert ∈ {CRITICAL, HIGH}
P4 — correlation(original, preprocessed) > 0.7
P5 — f(x) = f(x) (deterministic & reproducible results)
Challenges we ran into
Agentic Orchestration is powerful Breaking complex workflows into specialized agents makes systems much more maintainable, testable, and scalable. When each agent has a single clear responsibility, debugging becomes surprisingly straightforward.
Property-based testing is essential Regular unit tests only check specific examples. Property-based testing lets you validate universal properties across a huge (practically infinite) range of inputs — and it regularly catches edge cases you would never have thought to test manually.
Type safety pays off immediately Adding complete type hints + running mypy caught 36 bugs before they ever reached runtime. The upfront investment in proper typing gave very fast returns.
Signal processing (especially ECG) is legitimately hard The signals are noisy, highly variable between people, and full of complexity. Key things we had to learn/implement properly: Bandpass filtering (remove baseline wander + high-frequency noise) Reliable R-peak detection (Pan-Tompkins algorithm) Accurate ST-segment extraction and elevation measurement QT interval correction to account for heart rate variability
Determinism is non-negotiable My first version used random confidence scores → total reproducibility disaster. Lesson: In serious AI systems, never use true randomness. Derive any variation/"randomness" you need deterministically from the input features instead.
Accomplishments that I'm proud of 82/82 Tests Passing ✅
- 74 unit/integration tests
- 8 property-based tests
- 100% success rate
Production-Quality Code 💎
- Full type coverage (0 mypy errors)
- Black formatted
- Comprehensive docstrings
- Clean architecture
Real-Time Performance ⚡
- Average processing: 5 seconds
- 50% under 10-second requirement
- Handles concurrent requests
Explainable AI 🔍
- Clear clinical reasoning
- Visual ST-elevation highlighting
- Confidence scores with thresholds
Complete Documentation 📚
- README with installation, usage, API docs
- DEMO guide with step-by-step instructions
- Architecture documentation
- Expected outputs for all test cases
What I learned What I Already Have (Strong Foundation):
Autonomous Operation ✅
- Agents work without human intervention once ECG is uploaded
- Orchestrator delegates tasks automatically
- End-to-end workflow from ingestion → analysis → diagnosis → alert
Multi-Agent Architecture ✅
- 4 specialized agents working together
- Agent coordination and delegation
- Workflow state management
Real-World Actions ✅
- Generates emergency alerts (CRITICAL/HIGH/MODERATE)
- Creates actionable recommendations ("Call 911", "Urgent consultation")
- Produces clinical reports with visualizations
Adaptive Behavior ⚠️ (Partial)
- Graceful degradation (fallback strategies)
- Error handling with retries
- BUT: No learning/improvement over time
❌ What's Missing (Critical Gaps):
- Real-Time Data Sources ❌ Currently: Static CSV file uploads Needed: Live data streams (simulated is fine for POC)
- Continuous Learning ❌ Currently: Static rule-based + fixed ML model Needed: Feedback loop that improves detection
- Self-Improvement ❌ Currently: No mechanism to learn from outcomes Needed: Track performance, adjust thresholds
What's next for QueenPulseOMIGuard (Q-POG)
Short-Term (Next 3 Months)
CNN-LSTM Model Integration 🧠
- Train deep learning model on MIT-BIH database
- Target 95%+ accuracy (vs. current 92%)
- Implement ensemble with rule-based system
Real-Time Wearable Integration⌚
- Apple Watch ECG API integration
- Fitbit/Garmin compatibility
- Continuous monitoring mode
Cloud Deployment☁️
- AWS Lambda for serverless scaling
- API Gateway for public access
- DynamoDB for report storage
Beyond that who knows...they sky is the limit.
Acknowledgments
- Queen of Hearts AI Research Team** (UC Davis, Mount Sinai) for inspiring this work
- Kiro IDE for enabling spec-driven development
- Agentic Orchestration Hack 2025 for the opportunity
- PhysioNet for open ECG datasets
References
- Queen of Hearts AI Model (2025) - 92% STEMI accuracy, 5x false positive reduction
- MIT-BIH Arrhythmia Database - Standard ECG benchmark dataset
- Bazett's Formula for QTc correction - Clinical cardiology standard
- Pan-Tompkins

Log in or sign up for Devpost to join the conversation.