Inspiration

Elephants communicate in infrasound — rumbles as low as 10 Hz that travel kilometers through the earth — but the field recordings that researchers at ElephantVoices rely on are constantly contaminated by the modern world: airplane flyovers, diesel generators, motor vehicles, wind. These aren't minor annoyances. They can mask the very calls scientists are trying to study, making recordings unusable and potentially erasing behavioral data that took months to collect. We partnered with ElephantVoices to ask: what if researchers could upload a ruined recording and get back a clean one, scientifically annotated and ready for publication?

What it does

EchoSave is an end-to-end elephant vocalization research platform. Researchers upload a WAV, MP3, or FLAC field recording (up to 500 MB), and EchoSave runs a 7-stage processing pipeline that classifies the noise type (airplane, vehicle, generator, or wind), strips it using an ensemble of denoising algorithms, and extracts 12 acoustic metrics per detected call. The platform outputs denoised audio, before/after spectrograms, call-sequence analysis, and research-grade CSV/JSON/ZIP export bundles — all streamable in real time via WebSocket. A Gemini-powered AI sidebar lets scientists ask natural-language questions about their recordings, and an infrasound reveal tool pitch-shifts sub-20 Hz rumbles into audible range so researchers can hear what elephants actually said.

How we built it

The backend is Python 3.11 + FastAPI, exposing ~80 REST endpoints and a WebSocket progress stream. The audio DSP layer uses librosa, scipy, soundfile, and noisereduce to segment recordings into overlapping 60-second chunks. Our denoising ensemble runs four backends in parallel threads — spectral gating, Wiener filtering, a tiny U-Net, and Facebook's Demucs htdemucs — and scores each candidate on a weighted composite of SNR improvement, energy preservation, harmonic preservation, spectral distortion, and artifact level before selecting the winner. Call detection, feature extraction, and individual elephant ID (via HDBSCAN clustering on 64-dimensional acoustic fingerprints) run downstream. The frontend is Next.js 14 + TypeScript + Tailwind, with wavesurfer.js for waveform playback, Plotly for spectrograms, and Three.js/R3F for 3D visualizations.

Challenges we ran into

The hardest problem was noise classification accuracy. Our heuristic classifier hit only 43.2% overall accuracy on our 44-sample test set — airplane recall was solid at 90.5%, but car and generator recall both flatlined at 0% because their spectral centroids (1300–1700 Hz) are nearly indistinguishable to a rule-based system. Infrasound also posed a unique challenge: elephant rumbles sit at 10–22 Hz, below the threshold most audio tooling is designed for, requiring custom HPF-aware gating and a dedicated pitch-shifting module to even make the calls audible. Building a pipeline that is strictly non-generative — every denoising operation only masks, subtracts, or re-weights existing STFT bins, never synthesizes new spectrum — was both a scientific and engineering constraint we had to enforce throughout.

Accomplishments that we're proud of

We're proud of the ensemble denoising architecture — the circuit-breaker-protected, parallel multi-backend scoring system that gracefully degrades when ML models aren't available and still returns a scientifically honest result. We're also proud of the scientific integrity constraints baked into the product: nothing is fabricated. Every metric is measured, every low-confidence call is labeled "Unclassified — insufficient acoustic match," and every Gemini response is flagged as "AI-assisted interpretation." Building a research tool that a scientist could actually trust felt more important than shipping flashy numbers.

What we learned

We learned that bioacoustics is a domain where honest uncertainty is a feature, not a bug. The hardest design decisions weren't technical — they were epistemological: when do you show a result vs. withhold it? We also learned that spectral overlap between noise types is a genuinely hard ML problem that heuristics alone can't solve, and that a scikit-learn Random Forest trained on even a small labeled dataset would dramatically outperform hand-crafted frequency rules. Lastly, building for real researchers (vs. a demo) forced us to think about export formats, reproducibility, and data provenance from day one.

What's next for EchoSave

The immediate next step is replacing the heuristic noise classifier with a scikit-learn Random Forest to push recall above 70% across all noise types. Beyond that: a larger labeled training corpus in partnership with ElephantVoices, PESQ-based perceptual quality scoring, a mobile-friendly upload flow for in-field use, and deeper individual identification by training the fingerprint embeddings on known elephant IDs from the ElephantVoices database. Longer term, we want EchoSave to become a community platform where researchers across Africa and Asia can contribute recordings, building the largest open dataset of annotated elephant vocalizations in existence.

Built With

Share this project:

Updates