ElephantVoices

Inspiration

Elephants communicate through low-frequency rumbles at 10-20 Hz, below what humans can hear. Decades of field recordings captured these calls, but airplane engines, vehicle motors, and generators contaminate the same frequency band. Biologists can't measure what they can't isolate. Dr. Mickey Pardo's workshop showed us spectrograms where rumbles and noise literally overlap in both time and frequency, rendering hundreds of recordings unusable. Every off-the-shelf denoiser we tested (veed.io, lalal.ai, Audacity noise reduction) applies high-pass filters that destroy the fundamental frequencies elephants actually use. We realized this isn't a generic noise removal problem, it's a domain-specific signal preservation problem where the standard tools fail by design.

What it does

A noise-aware DSP pipeline that processes all 212 annotated elephant calls across 44 field recordings, removing mechanical noise while explicitly preserving the 0-40 Hz infrasound band.

The pipeline auto-detects the dominant noise family (generator, vehicle, or airplane) and routes each recording through specialized processing:

Generator noise: Notch filters at 60 Hz harmonics + spectral subtraction
Vehicle noise: Band-split Wiener filtering + temporal median smoothing on the 20-80 Hz engine band
Airplane noise: Adaptive spectral subtraction with overlap-aware gentleness for multi-speaker recordings

A do-no-harm gate automatically reverts any call where denoising would reduce SNR by more than 2 dB — the pipeline never ships a worse result than the original.

Results on the full dataset:

95.3% of calls preserved or improved (202/212)
75 calls actively recovered with measurable SNR improvement
10 calls flagged for manual review (irrecoverable low-SNR conditions)
0 vehicle calls degraded (safety-first design)

Deliverables ready for biologist handoff:

212 individual WAV clips with metadata
Raven Pro label files (tab-delimited, directly importable)
Before/after spectrograms with F0 frequency tracking overlay
A Streamlit app with dataset exploration, live upload denoising, and benchmark metrics
An experimental overlap separation demo using harmonic masking

How we built it

Core DSP (Python, librosa, scipy): STFT-based spectral subtraction and Wiener filtering with explicit band-splitting to protect infrasound. The key design constraint: never apply processing below 20 Hz where elephant fundamentals live. For vehicle noise, we added a temporal median filter on the 20-80 Hz band, engine noise fluctuates faster than elephant rumbles, so median smoothing across ±3 STFT frames suppresses motor modulation while preserving the slow-changing call structure.

Noise classification (scikit-learn): A Random Forest classifier trained on 19 spectral features (including 13 MFCCs) with leave-one-out cross-validation on 44 files. Three-tier detection: filename heuristic → ML classifier → signal threshold fallback. Achieved 72.7% accuracy with SMOTE oversampling for minority classes.

Validation stack:

Synthetic benchmark: 150 mixes of realistic modulated rumbles (including frequency sweeps) embedded in real noise segments extracted from the dataset, evaluated at -3, 0, and +3 dB SNR
ML baseline: Demucs (htdemucs) comparison showing domain-specific DSP outperforms generic music source separation by 2.7 dB in the 10-40 Hz band
Ablation study: 5 pipeline variants proving infrasound protection is the critical safety component
Peer validation: Self-blind review of 10 representative calls (6/10 agreement)

App (Streamlit): Five-tab interface, overview metrics, benchmark analysis, dataset explorer with per-call spectrograms and audio, overlap separation demo, and live file upload with real-time denoising.

Pitch deck (pptxgenjs): Auto-generated from pipeline output JSONs, single source of truth, no stale numbers.

Challenges we ran into

Vehicle noise is fundamentally entangled with elephant rumbles. Vehicle engines produce broadband low-frequency energy that occupies the exact same 10-40 Hz band as elephant fundamentals. We attempted three different DSP approaches, aggressive spectral subtraction (reverted: too many degraded calls), spectral gating (insufficient impact), and temporal median filtering (marginal improvement). After three failed attempts, we concluded that vehicle-band separation requires learned priors from a trained model, not classical DSP. We documented this honestly: 5/96 vehicle calls improved, 0 degraded.

The infrasound protection trade-off. Our ablation study revealed a tension: removing infrasound protection recovers 38 additional calls but degrades 8 more. On a 9-file sample this looked like a free win (same degraded count). On the full 212-call dataset, the 8 extra degraded calls appeared, proving the protection matters at scale. We chose safety: it's better to leave a call as "neutral" than to destroy an irreplaceable fundamental frequency.

Synthetic benchmarks don't transfer cleanly to real data. Our initial benchmark used pure sinusoidal rumbles that were trivially easy to denoise (inflated SDR). We hardened it with FM modulation, amplitude jitter, frequency sweeps, onset/offset envelopes, and non-stationary noise mixing. SDR dropped but became believable. The remaining gap between synthetic and real performance is the co-location of noise and signal in the same frequency band,something synthetic mixes with clean rumbles can't model.

Self-evaluation bias. Our quality labels ("improved", "marginal", "neutral", "degraded") are computed by the pipeline itself. Peer validation revealed we over-classify: 4/10 calls we labeled "improved" or "marginal" were rated "same" by the reviewer. The pipeline is optimistic, never pessimistic, but this means our 75 improved count is likely closer to 60 by human standards.

Accomplishments that we're proud of

The do-no-harm gate. It sounds simple, revert if SNR drops, but it's the design choice that makes the 95.3% preservation rate real. Most denoisers optimize for the average case and silently destroy edge cases. Ours refuses to ship a worse result.
Full-dataset infrasound ablation with empirical evidence. Not "we think protection helps" but "removing it causes 8 additional degraded calls on 212 real recordings." Data, not philosophy.
The overlap separation demo. Two elephant rumbles at 24.11 Hz and 11.92 Hz, overlapping for 3 seconds in a real vehicle recording, separated using soft harmonic masks. It's experimental, but it proves the concept on real data.
Honest failure documentation. Three vehicle denoising approaches attempted and documented with results. A noise classifier at 72.7% instead of claiming 100%. Peer validation at 6/10 instead of hiding disagreements. A negative SI-SDR explained by SAR, not hidden behind a footnote.

What we learned

Domain constraints beat generic algorithms. The single most impactful design decision was "never process below 20 Hz." Every off-the-shelf tool we tested violates this constraint. One if statement outperforms a 300M-parameter model.
Validation is harder than building. Writing the denoiser took 6 hours. Building the benchmark, ablation study, classifier evaluation, peer validation, and cross-checking every number across 5 output surfaces took longer. But without it, the results are just claims.
Negative results are results. Three vehicle denoising failures taught us more about the problem than the successes. Vehicle noise at 10-40 Hz is a fundamentally harder problem than generator or airplane noise because it's non-harmonic, non-stationary, AND co-located with the signal. That's a publishable finding, not a failure.
\$SDR = 10 \cdot \log_{10}\left(\frac{|s|^2}{|s - \hat{s}|^2}\right)$ is not enough. \ SDR penalizes amplitude changes. SI-SDR is scale-invariant but still punishes conservative processing. SAR (Signal-to-Artifact Ratio) was the metric that actually validated our approach — positive SAR confirms we don't introduce artifacts, even when SI-SDR is negative.

What's next for ElephantVoices

Learned vehicle denoiser. Train a lightweight U-Net or Wave-U-Net on the 96 annotated vehicle calls using our DSP pipeline outputs as weak labels. The DSP pipeline provides the training signal; the model learns the non-linear separation that DSP can't.
Expert validation at scale. Partner with Dr. Pardo's team to validate all 212 calls against biologist ground truth. Calibrate quality thresholds based on expert feedback, our self-blind review suggests we need stricter "improved" criteria.
Overlap separation as a first-class feature. Scale the harmonic masking prototype from one demo to all overlapping calls. Integrate F0 tracking into the denoising pipeline so the system can adapt in real-time when two elephants speak simultaneously.
Deploy as a biologist tool. Package the Streamlit app as a standalone tool that any field researcher can use: drag a WAV, get a cleaned clip + Raven labels + spectrogram. No Python knowledge required.

Built With

python

Updates

Sebastian NAPURI started this project — Apr 11, 2026 10:36 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.