Inspiration
We have noticed that the camera trap datasets used to train image classification models for the identification and detection of wildlife tend to be restricted to the classes considered at the time of training, leading to poor performance for new species and breeds of already considered species.
TLDR;
- Inspired by AgentWatch architecture
- Custom drift detection algorithms
- Vision-Language Model verification
What it does
This project extends an image classification feature with a VLM judge (Qwen) to verify the label of images classified with low confidence. This would be the basis for retraining of the classification model with newer classes for more robust future inference.
How we built it
The solution is set up as a simple ambient agent with the classification model, which is linked to a VLM served via Ollama on a PC and on Edge Devices (ROCK 5C and Jetson AGX Orin).
Challenges we ran into
- Integration of the different parts of the project
- Selection and working with different VLMs
- Setup on ROCK 5C, Jetson Orin Nano and Jetson AGX Orin (hence the use of a Mac for the demo)
Accomplishments that we're proud of
This project is the first MVP for a proper active learning setup for wildlife monitoring
What we learned
We learned that frame selection is as important as model accuracy. Intelligent frame sampling (motion-based) reduced computation 98% while maintaining accuracy. We also discovered that data drift detection, catching model failures autonomously, is more valuable than perfect initial accuracy. Class imbalance had no silver bullet; simple dataset balancing outperformed complex uncertainty sampling. Most importantly, we learned that edge constraints force innovation: working within 8GB RAM led to elegant solutions (MOG2 motion detection, strategic quantization) rather than brute-force approaches. The key insight: autonomous systems need self-monitoring, not just inference
ARM CPUs are Surprisingly Capable
- Discovery: All three platforms use ARM processors (Jetson uses ARM + GPU; ROCK 5B uses ARM + NPU)
- Jetson AGX: 12-core ARM CPU + 275 TFLOPS GPU
- Jetson Orin: 8-core ARM CPU + 40 TFLOPS GPU
- ROCK 5B: 8-core ARM Cortex-A76/A55 + 6 TFLOPS NPU
Finding: ARM isn't just for mobile enterprise-grade ARM CPUs (A76/A55) are production-ready.
Why It Matters: ARM architecture enables lower power consumption at scale while maintaining performance for sequential tasks
What's next for Ambient Agent for Wildlife Monitoring on Edge
- Explore executorch for Ultralytics
- Human-in-the-loop labelling
- Animal activity tracker
Built With
- fastapi
- ollama
- python
- qwen
- yolo

Log in or sign up for Devpost to join the conversation.