Inspiration

Auscultation is one of the oldest tools in medicine. It’s also one of the most subjective.

Fine crackles can be early signs of pneumonia, pulmonary fibrosis, and other serious lung conditions. They’re subtle and easy to miss. Many of them live in higher frequency ranges that naturally decline with age. Hearing changes don’t mean a physician is less skilled. It just means biology changes over time.

At the same time, medical software is often hard to access. Installing new tools can require IT approval, device management enrollment, and app store delays. In some clinics, even trying something new takes weeks.

So we asked a simple question:

What if every stethoscope had superhuman hearing, and all you needed was a URL?

That became Respi.

What it does

Respi is an AI-powered lung auscultation assistant that runs directly in the browser.

A clinician records lung sounds using a digital stethoscope or uploads an audio file. Respi then:

• Classifies the sound as Normal, Crackles, Wheezes, or Both • Estimates disease probabilities such as COPD, pneumonia, or bronchitis • Identifies whether sounds occur during inspiration or expiration • Generates a severity score for tracking changes over time • Shows SHAP-based explanations so clinicians can understand what influenced the prediction

There’s no download and no installation.

You open a link, connect a stethoscope, and start analyzing.

How we built it

We built Respi as a Progressive Web App using React and Vite so it could work instantly across devices.

On the frontend, we use the Web Audio API to capture high-quality audio directly in the browser. We render live waveforms so clinicians can see what’s being recorded in real time. Service workers allow the app to function even if connectivity drops.

The backend runs on FastAPI, with Supabase handling authentication, storage, and real-time updates. Audio is processed with Librosa and SciPy, and inference runs through ONNX Runtime for speed.

For the model, we used a CNN14 backbone pretrained on AudioSet and fine-tuned it on the ICBHI 2017 respiratory sound dataset. We combine mel-spectrograms with structured acoustic features in a multi-task architecture. The primary output is a four-class lung sound classification, with auxiliary disease probability predictions.

Explainability is built in from the start. We use SHAP-based feature attribution to highlight which acoustic features influenced each result in a way clinicians can understand.

From stethoscope capture to interpretable output, everything flows through the browser.

Challenges we ran into

Class imbalance was a major issue. Some classes, especially Both and Wheezes, were underrepresented. We used focal loss, oversampling, and class-aware mixup to improve minority-class performance.

Getting reliable, high-quality clinical audio in the browser required careful handling of sample rates, buffering, and filtering.

Explainability added computational overhead. Running SHAP on high-dimensional spectrogram data is expensive, so we optimized to balance speed and clarity.

Browser hardware support was inconsistent. Web Bluetooth works well in some environments but not all, so we built fallback paths for wired microphones and file uploads.

Accomplishments that we're proud of

We delivered lung sound analysis in a fully browser-based application with no installation required.

We built a complete pipeline from audio capture to interpretable AI output.

We leveraged audio-specific pretraining with CNN14 instead of relying on generic vision models.

We designed a hybrid architecture that fuses spectrogram inputs with structured acoustic features.

Most importantly, we built something that feels usable, not just experimental.

What we learned

Audio-specific pretraining significantly improves performance for medical sound analysis.

Class imbalance must be handled intentionally in respiratory datasets.

Signal quality and preprocessing matter just as much as model design.

Clinicians want to understand why a model made a decision, not just what it predicted.

And sometimes accessibility is the innovation. A browser-based tool can reach clinicians immediately when traditional distribution slows everything down.

What's next for Respi

We plan to improve model performance with stronger augmentation and refined training strategies.

We’re working toward streaming, real-time analysis instead of post-recording inference.

We want to integrate with EHR systems to fit directly into clinical workflows.

We also aim to validate Respi prospectively with pulmonologists and primary care physicians and move toward formal regulatory pathways.

Built With

  • cnn14-(audioset)
  • eact
  • fastapi
  • framer-motion
  • icbhi
  • librosa
  • onnx
  • pwa-(vite-pwa)
  • python
  • pytorch
  • react-router
  • recharts
  • shap
  • supabase
  • tailwind-css
  • typescript
  • vite
  • web-audio-api
  • web-bluetooth-api
  • zustand
Share this project:

Updates