Inspiration

Cardiovascular diseases remain a leading global health challenge that requires faster, more accessible screening tools. We were inspired to bridge the gap between complex acoustic heart data (Phonocardiograms) and actionable medical insights by leveraging the latest advancements in Vision-Language Models to interpret sound as a visual medium.

What it does

Cardio AI Analytics is a multimodal diagnostic dashboard that:Analyzes Signal Data: Automatically generates Time-Domain PCG Waveforms and Frequency-Domain Mel-Spectrograms from raw audio uploads. Performs Visual AI Reasoning: Feeds visual spectrogram data to the Llama 4 Scout Vision model to detect patterns and anomalies. Generates Multimodal Reports: Provides a clinical text summary, including potential diagnoses and lifestyle advice, alongside a voice-synthesized audio report. Offers a Clinical-Grade UI: Uses a responsive, color-coded interface to guide users through signal acquisition and diagnostic results.

How we built it

The project is built on a high-performance Python stack: Frontend: Developed with Gradio, utilizing custom CSS for a multi-window design based on color psychology (Clinical Blue for inputs, Crisp White for visuals, and Diagnostic Mint for results). Signal Processing: Powered by librosa and numpy to convert audio into Mel-Spectrograms focused on the 20-2000Hz range, which is optimal for heart sound analysis. AI Reasoning: Integrated the meta-llama/llama-4-scout-17b-16e-instruct model via the Groq API for ultra-fast visual inference. Speech Synthesis: Utilized gTTS to convert diagnostic text into playable clinical reports. Infrastructure: Fully containerized using Docker with a python:3.10-slim base image for consistent deployment.

Challenges we ran into

Optimizing Visuals for AI: We had to precisely tune our Mel-Spectrograms to focus on low-frequency heart murmurs (capped at 2000Hz) to ensure the Llama Vision model received the most relevant data. Payload Constraints: To bypass API payload limits, we implemented aggressive image compression (72 DPI) while maintaining enough visual clarity for accurate pattern recognition. Memory Management: We had to ensure matplotlib figures were closed after rendering in the Gradio backend to prevent memory leaks during heavy use.

Accomplishments that we're proud of

Latency: Achieving an end-to-end diagnostic cycle—from raw audio upload to a voice-narrated reportin under 2 seconds. Innovation: Successfully implementing a "visual reasoning" pipeline for audio data rather than using traditional black-box classification models. User Experience: Designing a professional-grade medical dashboard that is both aesthetically pleasing and functional for clinical settings.

What we learned

I gained deep technical insights into Digital Signal Processing (DSP) and how to effectively represent acoustic energy distribution as a visual prompt for Large Language Models. We also mastered the orchestration of complex multimodal pipelines involving audio, images, text, and speech synthesis.

What's next for Cardio-AI Assistant

We plan to integrate real-time streaming capabilities to allow clinicians to receive instant feedback during live physical examinations. Additionally, we aim to expand the model's training data to include a broader range of rare congenital heart defects and valvular disorders.

Built With

Share this project:

Updates

posted an update

Cardio-AI Assistant has evolved into a sophisticated "Visual AI Reasoning" engine that uses Llama 4 Scout to interpret heart sound spectrograms. Recent updates introduced multimodal features like gTTS-powered voice reports and high-resolution visualizations tuned for low-frequency diagnostic accuracy. The interface has transitioned into a clinical-grade dashboard with a modular three-panel design to streamline the medical screening workflow. Additionally, the project is now fully containerized via Docker, making the system highly scalable and ready for seamless cross-platform deployment.

Log in or sign up for Devpost to join the conversation.