-
-
Nvidia / AWS Agentic AI - Hackathon !
-
Jetson Orin Equipment for AI Agent portability to the Quinta Mazatlan Birding Center
-
Dual Cam setup with motion sensors for Bird Activity. Prime delivers birdseed on time !
-
Backyard Bird Soccer match awaits !
-
Backyard Feeders setup
-
Our 1st customers! Not 1 but 2 Green Jays with beautiful blue hoodies !
AI Bird Watch: Agentic AI for Interactive Bird Observation
Agentic AI with NVIDIA NIMs on AWS SageMaker
Inspiration
In the sunny Lower Rio Grande Valley (RGV), Texas—a biodiversity hotspot with over 500 bird species and 300 migratory birds—I found my spark. Keyboard life had pulled me away from the feeders outside my window: Hummingbirds zipping nectar from crepe myrtle, ducks paddling canals. Apps like Merlin glitched, bland UIs failing to draw me out. What if AI whispered back, turning detection into fables? Inspired by agentic systems (sense-decide-act) and RGV's flyway magic, I built AI Bird Watch: Fauna, Flora, and Fables—AI's call to the wild, reconnecting mind, body, and soul to nature for the NVIDIA/AWS Hackathon. Bonus: learned alot about our Fine-Feathered Friends along the way! What if an AI could detect and classify birds, but would also attempt to communicate with Birds? Sometimes, I hear the chirps of a different bird, but I can't see it. Common frustration for backyard bird watchers, who go to work and miss the wonders of their own Backyard. And how did this Bird get here? How far did it Fly? What Adventures along the Way...?!
How I Built It
Bootstrapped solo on this journey, then my friend Max helped out for Python polish. Hardware: Xorlink 4K Solar Cam + Blink Outdoor 4 for motion snaps, Nvidia Jetson Orin Nano for portable edge with onboard Yahboom camera, speakers/mic, VK-162 GPS for mapping and Anker power bank for 6 hours of AI Bird Watch in Remote bird locations. Corpora: eBird/Xeno-Canto real data + Gemini synthetic corpora creation of Bird Tales and 11,000+ bird images from Cal-Tech and 1,000 royalty-free images of 50 local RGV species: hummers/ducks/birds. Fine-tuned on SageMaker: YOLOv8n for vision (3 classes, 50 epochs, mAP@50=0.87), BirdNET for calls, Llama NIM for reasoning, NeMo for RAG embeddings. Pipeline: OpenCV streams RTSP, motion triggers YOLO → PyAudio records → RIVA narrates fables + facts. Streamlit UI (aibirdwatch.com) for uploads/wikis and live bird cams. The careful crafting of prompts for Gemini batch synthetic corpora was a great experience. The result: wonderful tales, such as the boy who got lost at a ranch and a Turkey Vulture encounter. Instead of being afraid, the boy follows the Vulture in the air and is guided to a watering hole, saving his life. Amazing stories to narrate.
Regional fine-tuning scales it: Models attuned per area (e.g., Texas RGV vs. California coast), gathering all U.S. bird info with monthly/seasonal corpora for migrations (e.g., spring warblers in thornscrub). Each AI Bird Watcher contributes to its local Bird Corpora with fresh images and video. Transforms Backyard Bird Watchers into Backyard Ornithologists! Kicker: Giveaways/prices for rare bird captures—top submitters win Blink cams or spotlight in the Flock Gallery leaderboard, fueling viral growth.
What I Learned
Agentic AI's power: Llama NIM chains decisions like "unseen exotic? Attract call"—grounded by RAG to slash hallucinations 30%. NVIDIA NGC NIMs are plug-play gold; AWS SageMaker democratizes GPUs. Corpora curation? Synth augments real for 20% accuracy boost. Deeper: AI as nature bridge—fables pull us outside, mind mapping migrations, body chasing calls, soul in the whispers. Hack grind taught resilience: Subset 50 species over perfection; vibe code with tools like Grok for flow. Also learning about all the possibilities of the Jetson Orin Nano.
96 million people in the USA are bird watchers. That's 1 in 3 people who love birds. It occurred to me, as I was buying backyard cameras, new feeders and bird seed, that Amazon could really fund this project. Hey Amazon 👋 why not host a Cloud for the Bird Lovers ?! ☁️🐦 91 million of those 96m people are backyard bird watchers. Blink Bird Sanctuaries. We help migrating birds on their journey, boost sales of cameras & birdseed, which in turn helps the farmers who make the birdseed. It's a win-win-Wing situation !
For anonymous live bird feeds, Amazon Kinesis Video Streams enables secure, E2E encrypted streaming from IoT cameras like Blink, allowing users to contribute fables without revealing identities—scoping permissions to device shadows via AWS IoT Core. The Bird Network can grow overnight, as Amazon deploys its marketing team to recruit Bird Lovers with Blink Cameras to join the new Bird wiki. Think about a chain of Blinks documenting Bird Migrations through America. AWS Location Service integrates seamlessly with Jetson for GPS mapping, tagging fables with real-time flyways.
Model Update: Decided on using Amazon Polly API instead of RIVA for TTS narration for simplicity and speed. The RIVA will be onboard the Jetson Orin Nano Explorer which will be taken to various remote bird locations, works autonomous on movement detect and will identify, produce bird wiki and narrate on site what Bird encounters it has!
Bird update: 1 in 3 people like birds? Decided to test this stat on the first Lyft driver i met. Do you like Birds? Oh, I love Birds, she responded. The lady driver said she recently rescued a lost and hurt duckling, she nurtured back to health and now 2 months old. She is anxious about returning the duckling to its natural habitat. I informed her of AI Bird Watch and recommend that she talk to our local 20 year Bird expert at QuintaMazatlan.com. Our first rescue-assist ! She agreed and will also bring Mac the Duckling to my backyard to record. This will be an interesting test for YOLO to see if it can recognize a small duck. If not, then should run another fine tune with Duckling images and corpora.
Also asked a friend if he likes birds- he sent me pictures of his Texas Ranch where a Red Cardinal and a Red Robin are visiting his water pail. Heading out there with the Xorlink Solar Camera and a Starlink to provide a Live feed !
Built with What?
- Python 3.12
- Ultralytics YOLOv8
- BirdNET (Hugging Face)
- Streamlit
- PyAudio/Librosa
- NVIDIA Jetson Orin Nano
- NVIDIA AI Workbench/NGC NIMs (Llama-3.1-Nemotron-Nano-8B-v1, NeMo Retriever, RIVA)
- OpenCV
- Folium
- AWS SageMaker/IoT Core/S3/Amplify/Kinesis Video Streams/Location Service
- Roboflow (labeling)
- eBird/Xeno-Canto APIs
Testing Instructions for the Application
Bird Watch lives at aibirdwatch.com (Streamlit on Amplify)—a portal for agentic birdwatching. No installs; browser-only (Chrome/Firefox). Test desktop for streams, mobile for UI.
- Visit aibirdwatch.com >
- Sidebar: "Live Window" > Live bird stream cameras—motion triggers YOLO box + BirdNET verify.
- Success? Poly/RIVA narrates facts and adventure tale; Bird Wiki page generated (facts/map).
- "Submit Fable": Upload pic/audio/video + blurb > "Weave"—RAG generates, adds to gallery.
- "Flock Gallery": Browse Bird Wiki samples, vote on favorites.
Agentic App Architecture
Modular pipeline: Sense (OpenCV RTSP from multi-cams) → Decide (Llama NIM reasons: "Match conf? Attract?") → Retrieve (NeMo NIM embeddings query corpora) → Act (RIVA narrates, PyAudio interacts). SageMaker endpoints for fine-tuning; Jetson edge for portable. Llama-3.1-Nemotron-Nano-8B-v1 NIM for CoT decisions (fine-tuned on RGV fables, 25% less hallucinations). Selected NeMo Retriever NIM for embedding—dense vectors (512-dim) on 1k facts, precision@5=0.88 for grounded queries.
ASCII Diagram:
Cams (RTSP/API) → YOLO/BirdNET → Llama NIM (Reason)
↓
NeMo Retriever (RAG Corpora) → RIVA Act (Narrate/Wiki)
↓
S3 Sync (Flock Gallery)
Built With
- amazon-web-services
- core/s3/amplify
- ebird/xeno-canto
- folium
- labeling)
- nemo-retriever
- opencv
- python-3.12-ultralytics-yolov8-birdnet-(hugging-face)-streamlit-pyaudio/librosa-nvidia-jetson-orin-nano-nvidia-ai-workbench/ngc-nims-(llama-3.1-nemotron-nano-8b-v1
- riva)
- roboflow
- sagemaker/iot
Log in or sign up for Devpost to join the conversation.