Inspiration
Acoustic bird identification systems are incredibly powerful, but they are inherently limited by microphone quality, overlapping environmental noise, and a lack of real-world context. A neural network might mistake a background artifact for a rare bird that has never traveled to that continent. We were inspired to build Birdly because we realized bird monitoring could be drastically optimized if AI crossed-referenced what it hears with real-world, localized, and seasonal occurrence data.
What it does
Birdly is an intelligent, multi-turn bird sound recognition assistant. When a user uploads an audio recording, Birdly executes a chained tool-workflow: it uses acoustic analysis to find candidate species, queries the regional Swedish species observation framework (Artportalen) for recent real-world sightings within the area, and fuses these signals together. The result is a geographically calibrated identification delivered alongside an interactive table and audio widget.
How we built it
We engineered Birdly using a decoupled multi-agent architecture composed of a Planner Mode and a Writer Mode. The Planner acts as the orchestrator—analyzing the message history, extracting attachments, setting fallback spatial parameters (defaulting to Huddinge, Sweden), and designing a sequential plan of execution. The Writer serves as the interface layer, synthesizing complex tool logs into clean, structured responses.
The Core Algorithm
To resolve acoustic ambiguities, we implemented a custom scoring mechanism that combines acoustic confidence with regional observation density using logarithmic smoothing:
$$ S_{final} = w_1 \cdot S_{audio} + w_2 \cdot S_{geo} $$
Where $S_{audio}$ is the raw confidence from the acoustic model, and $S_{geo}$ is the geographic score calculated as:
$$ S_{geo} = \min \left( 1.0, \frac{\ln(1 + \text{count})}{\ln(1 + \text{max_expected_count})} \right) $$
This algorithmic fusion dynamically corrects the leaderboard—down-ranking acoustic anomalies and promoting species that are biologically verified to be thriving in that specific coordinates radius.
Challenges we ran into
Our journey was defined by complex data integration hurdles:
- Taxonomical Translation Barriers: The acoustic analyzer returns English vernacular names, while the Swedish Dyntaxa database requires strict scientific Latin nomenclature, causing initial queries to return null values.
- API Payload Overloads: When conversions failed, empty query blocks caused the Artportalen API to default to dumping over 11,000 regional records, overwhelming the LLM's context window.
- Conversational Chaining Pitfalls: We faced data format disparities where tool arguments shifted between string lists and structured dictionaries across execution loops.
- Follow-up Redundancy: Preventing the Planner agent from re-running the entire expensive execution pipeline when a user simply asked a follow-up question ("Why this judgment?") required fine-grained context rules.
Accomplishments that we're proud of
- Defensive Intercept Architecture: Built an elegant "Early Return" validation mechanism that halts data-fetching pipelines gracefully if no matching taxonomical IDs are resolved.
- Flawless Score Fusion Proof of Concept: Successfully demonstrated a live data correction where a bird species originally ranked second by audio alone was promoted to number one due to over 11,000 localized observations.
- Resilient Stream Trimming: Implemented a robust parsing engine that safely isolates clean JSON payloads by anchoring on structural bracket boundaries, completely neutralizing extraneous Chain-of-Thought logs or token leaks.
What we learned
We gained deep expertise in the constraints of deterministic AI Agent development. We mastered the art of state preservation through multi-turn user sessions, learned why strict parameter validation must adapt to both string and object data types, and realized that an LLM's reasoning capability is best utilized when its prompt boundaries are treated with strict engineering constraints.
What's next for Birdly
We aim to evolve Birdly from a local prototype into a scalable ecosystem. Our roadmap includes converting our static taxonomical mapping into a dynamic translation microservice, adding support for live audio streaming via WebSockets, and implementing adaptive weights ($w_1, w_2$) that automatically recalibrate based on regional avian migratory calendars.
Built With
- artpotalen
- birdnet
- chainlit
- python
Log in or sign up for Devpost to join the conversation.