ElePhones

Inspiration

What first piqued our interest in working on elephant harmonics was our mutual love for live music. Just last week we were at a concert together, which is why when we saw an option to analyze frequencies, we jumped on it.

What it does

It takes a noisy field recording of elephant calls and removes the mechanical noise from generators, cars, and planes, leaving behind a clean audio file that biologists can actually measure and analyze. It does this not by generic noise removal but by detecting the mathematical fingerprint of elephant vocalizations specifically: their strict harmonic series at integer multiples of a fundamental frequency between 10 and 25 Hz. Anything that doesn't fit that pattern gets removed.

How we built it

We built the pipeline around one observation: frequency resolution is the problem every other tool ignores. At the default FFT setting of n_fft=1024, each bin is 43 Hz wide and the entire infrasonic range of elephant calls becomes a smear of indistinguishable energy. Setting n_fft=8192 brings that down to 5.4 Hz per bin, which is the minimum needed to actually resolve the harmonic structure we planned to exploit. From there we apply Harmonic-Percussive Source Separation, which uses median filtering along the time axis to enhance the horizontal harmonic contours elephant rumbles produce while separating out the transient vertical bursts that characterize car noise. With a cleaner spectrogram, we run Normalized Subharmonic Summation, adapted from cetacean bioacoustics research, which sweeps candidate f₀ values from 8 to 25 Hz and for each one sums spectral energy at every harmonic up to 1000 Hz, picking the fundamental whose entire harmonic series best accounts for the energy in the signal. Because the 2nd harmonic is stronger than the fundamental in elephant rumbles, naive peak detection reliably returns the wrong answer, so we added an octave-check that detects this and corrects the estimate. With f₀ tracked per frame, we build a time-varying harmonic comb mask centered on each detected harmonic, zeroing out everything between the teeth and applying it to the full complex spectrogram to preserve phase through reconstruction. A final noisereduce pass handles residual noise the comb couldn't resolve, using a stationary noise profile for generators and adaptive non-stationary mode for cars and planes, producing a clean WAV file compatible with Raven Pro.

Challenges we ran into

We evaluated every major denoising approach before writing a line of code. Spectral gating, bandpass filtering, wavelet denoising, neural source separation, commercial AI tools. Most failed for the same reason, they treat every frequency bin independently and have no concept of what an elephant call actually looks like. Neural approaches like Demucs and Conv-TasNet were tempting but ruled out immediately since they are trained on speech and music frequencies and we had 44 recordings, nowhere near enough to fine-tune a model without severe overfitting. What we kept coming back to was the one property that separates elephant vocalizations from every mechanical noise source: the strict integer-multiple harmonic series. That structure is not statistical or approximate, it is a law of wave physics, and it meant we could build a pipeline that exploits it directly rather than learning to approximate it from data. That is what led us to subharmonic summation combined with harmonic comb masking, stacked on top of HPSS preprocessing and residual spectral gating, as the architecture we committed to.

Accomplishments that we're proud of

The part of our pipeline we are most proud of is that we built two completely independent denoising pipelines and gave the end user the choice of which one to run, similar to choosing between AI models. The first is a classical DSP pipeline: high resolution STFT, HPSS preprocessing, subharmonic summation for f0 detection, and a time-varying harmonic comb mask, derived entirely from wave physics with no training data required. The second is a separate machine learning pipeline using noisereduce, a spectral gating model published in Scientific Reports in 2025, which builds a statistical noise profile from the silent gaps between calls and suppresses residual energy adaptively. Both pipelines take the same input and produce the same output format, but they get there through fundamentally different means. One is deterministic and grounded in the mathematics of resonating systems. The other is statistical and learned from the noise characteristics of each individual recording. Giving researchers the ability to run both and compare results on their own recordings was a deliberate design decision, and one that we think makes the tool genuinely useful beyond the hackathon context.

What we learned

Building this taught us that the right abstraction matters more than the right tool. We spent the first part of the hackathon evaluating existing solutions and kept hitting the same wall: every mainstream denoising tool is built around human hearing. The frequency ranges, the training data, the default parameters, all of it is calibrated for speech and music. The moment we stopped asking which existing tool to use and started asking what is mathematically unique about elephant vocalizations, the architecture became obvious. That shift, from tool selection to first principles reasoning, was the most important thing we did. On the technical side we learned that classical DSP is still the right choice when you have strong domain priors and limited data, and that machine learning works best when you have enough examples to learn from rather than enough physics to reason from. We also learned that the decisions that look trivial, like FFT window size, are often the ones that determine whether the entire approach is even possible. n_fft=8192 versus n_fft=1024 is a single parameter change that separates a working pipeline from one that cannot see the signal at all.

What's next for ElePhones

ElephantVoices maintains a database of roughly 7,000 annotated calls from known individuals in known social contexts, with another 5,000 still to be added. Every call ElePhones cleans is one more data point that feeds directly into that database and into the machine learning models being developed to decode elephant communication. The immediate next step is validating the pipeline on the full set of 212 annotated calls across all three noise categories and tuning the f0 detection thresholds on real field data rather than synthetic test cases. Beyond that, the harmonic comb approach is not specific to elephants. Any vocalization with a strict harmonic series, whales, dolphins, certain bird species, can be extracted using the same architecture with different f0 search ranges and harmonic counts. The longer term vision for Elephones is a generalized bioacoustic denoising platform where researchers specify the species and the tool configures itself accordingly, making clean recordings accessible to field biologists who currently have to discard a significant portion of what they collect.