Inspiration

It's 2 AM in the Chitwan district of Nepal. A farmer named Bishnu has been awake for three hours, sitting in the dark at the edge of his field with a flashlight. He does this every night — because the week before, an elephant herd crossed his perimeter and destroyed six months of crops in forty minutes.

He is not alone. Across South Asia, Sub-Saharan Africa, and Latin America, 750 million smallholder farmers face the same impossible choice: stay awake and guard, or sleep and risk losing everything. Electric fences cost $800. Watchtowers need people. Manual patrols are dangerous — over 100 farmers are killed annually in human-wildlife encounters they never saw coming.

I asked one question: what if any cheap IP camera could become an intelligent perimeter guard that detects, analyzes, and alerts in seconds — and gets smarter with every encounter?

That question became PAWS.


What it does

PAWS gives every farm an AI watchman powered by Amazon Nova — one that never sleeps, speaks every language, and learns from every detection.

Connect any camera feed — an IP camera zip-tied to a fence post, a Raspberry Pi tucked under a roof, even a zoo's public livestream for testing. Press Start Inference. From that moment, PAWS watches.

When a dangerous animal enters the frame:

In the first second: YOLO-World running on a cloud GPU identifies the species, draws bounding boxes color-coded by threat level, and sends a 5-frame snapshot to the backend. For Asian farms, a fine-tuned RT-DETR model runs simultaneously as a confidence booster for the 9 highest-risk species.

In the next two seconds: Amazon Nova 2 Lite receives those frames and does something remarkable — in a single API call, it simultaneously confirms whether the animal is actually threatening (charging vs. grazing is a critical distinction that determines whether a farmer loses sleep or loses crops), scores severity on a 1–10 scale, classifies the exact behavior, recommends the right deterrent for that specific species, and composes alert messages in six languages. One call. Six outputs. Under two seconds.

In parallel: Amazon Nova Embed converts the detection into a 384-dimensional behavior vector and searches past incidents for similar patterns — recognizing when an elephant is simply passing through versus actively threatening the perimeter, reducing false positives over time.

Before five seconds are up: The farmer's phone buzzes. A Telegram message arrives with a photo of the detected animal, the severity score, and two buttons: Real threat and False alarm. An ntfy.sh push notification fires simultaneously. A voice alert speaks in the farmer's own language. Every registered farm within 15km receives a cascade alert automatically. If edge hardware is connected, the deterrent fires — ultrasonic burst for elephants, predator audio for big cats, strobe for most mammals. Non-harmful, species-specific, immediate.

Bishnu taps on his phone. That label — human-verified, ground-truth positive — automatically sorts into the training dataset. The model gets smarter. The next farmer benefits.


How we built it

PAWS is a solo-built end-to-end system spanning GPU inference, multi-model AI orchestration, a 14-step backend pipeline, IoT hardware integration, and a full SvelteKit dashboard — designed, trained, and deployed by one person over the course of this hackathon.

Custom fine-tuned RT-DETR — Asia Region Pack v1 Before writing a single line of the backend, I fine-tuned a RT-DETR (Real-Time Detection Transformer) model on the 9 most dangerous wildlife species encountered on Asian farms: elephant, tiger, leopard, sloth bear, wild boar, wolf, gaur, king cobra, and clouded leopard. RT-DETR's transformer-based architecture significantly outperforms anchor-based detectors on partially occluded animals — critical when a tiger is moving through tall grass or an elephant is partially hidden by perimeter vegetation at night. This custom model runs in parallel with YOLO-World on Modal's GPU: when both models agree on the same species, confidence is boosted by 15%, reducing false negatives for the most dangerous Asian species. The fine-tuning dataset was assembled from wildlife conservation imagery and augmented with farm perimeter viewpoints to match real deployment conditions.

The camera layer accepts any stream format: RTSP, HLS, direct IP camera, or YouTube live. Each stream is independent — farmers can add or remove cameras from the dashboard without restarting anything. The dynamic camera grid auto-adjusts layout from 1 to 9+ cameras.

The inference layer runs YOLOv8s-WorldV2 on Modal.com's serverless T4 GPU containers. YOLO-World's open-vocabulary design means I can detect any animal by passing region-specific text prompts at runtime — a farm in Kenya watches for elephants and hyenas, a farm in Montana watches for bears and wolves — with zero retraining required. The fine-tuned RT-DETR Asia model runs alongside it as a precision layer for the highest-risk species.

The pipeline is a 14-step FastAPI orchestration running in a background thread, publishing Server-Sent Events to the dashboard at every step. Farmers and judges can watch every decision happen in real-time — gate check, debounce, five Nova model calls, dataset archiving, alert dispatch, community mesh cascade, deterrent trigger, and authority notification — as a live terminal trace on the dashboard.

The Nova integration is the brain of the system. Five distinct Nova calls per confirmed threat:

  • Nova 2 Lite — threat analysis (behavior, severity, deterrent, 6-language alerts)
  • Nova Embed — behavior pattern matching against historical incidents
  • Nova 2 Lite — incident report narrative generation
  • Amazon Polly — voice alert synthesis in 6 languages (Nova Sonic on the roadmap)
  • Nova Act — automated wildlife authority report filing

The alert stack uses only free services by design. ntfy.sh push notifications work on any smartphone without an app. Telegram's Bot API delivers photos with inline feedback buttons. A farmer in rural Nepal pays zero per alert.

The feedback loop is the long game. Every Telegram button press — ✅ or ❌ — automatically relabels the saved frame in YOLO format. Confirmed threats become training positives. False positives become hard negatives — the most valuable samples for reducing false alarm rates in future fine-tuning runs. The dataset grows with every farm that runs PAWS.


Challenges we ran into

YouTube IP blocking on Modal. I initially supported YouTube live streams as demo sources. yt-dlp resolved the stream URL perfectly locally, but Google's CDN segments are IP-locked — Modal's datacenter IPs received 403s. The fix was resolving the URL on the local backend before passing the direct segment URL to Modal.

MJPEG rendering inconsistency across browsers. Modal's inference container outputs multipart/x-mixed-replace frames. Chrome and Edge handle this differently across security contexts. I went through three architectures — direct <img> to Modal, a backend proxy, and a canvas snapshot approach — before landing on the configuration that balanced latency, reliability, and CORS compliance.

Making Nova the brain, not a bolt-on. The temptation is to call the model once and treat it as a classifier. The breakthrough was realizing that Nova 2 Lite could simultaneously analyze a threat, reason about deterrents, compose multilingual messages, and generate a professional incident report — all in a single system prompt with structured JSON output. That realization changed the architecture from "camera app with AI" to "AI system with cameras."

Building for unreliable connectivity. Farms near wildlife reserves often have degraded or no internet. A safety system that fails when connectivity drops is worse than no system — it creates false confidence. The two-tier architecture (edge buzzer fires immediately from local RT-DETR/YOLO-nano, cloud Nova layer enriches when available) was harder to build but essential for the real-world use case.

RT-DETR fine-tuning for Asian wildlife. Assembling a training dataset that reflects actual farm perimeter viewpoints — not zoo photography or conservation images shot at eye level — required significant augmentation. Animals at night, partially occluded, at distance, in monsoon rain. The model had to learn to detect a tiger in tall grass from a camera mounted at 2 meters, not a tiger posed in a clearing.


Accomplishments that we're proud of

End-to-end under 5 seconds. From camera frame to Telegram alert with detection photo — on warm Modal containers, the median pipeline time is 2.3 to 2.9 seconds, measured against real elephant and baboon live camera feeds.

RT-DETR fine-tuned on 9 Asian threat species. Training a custom Real-Time Detection Transformer — elephant, tiger, leopard, sloth bear, wild boar, wolf, gaur, king cobra, clouded leopard — and deploying it as a confidence-boosting layer alongside YOLO-World. This hybrid gives global coverage with regional precision.

Real detections on real cameras. Tested against live elephant enclosures, baboon troops, and bear cameras. Nova's behavioral analysis correctly distinguished a grazing elephant from one pushing against a fence post — a distinction that determines whether a farmer loses sleep or loses crops.

Six languages in one Nova call. English, Hindi, Swahili, Spanish, Polish, and Arabic alert messages composed simultaneously in a single API call. No translation service. No hardcoded strings. A farmer in Kenya and a farmer in Poland receive alerts in their own language from the same pipeline run.

$0 per alert, permanently. ntfy.sh and Telegram Bot API are free forever. A farming cooperative with 50 members runs PAWS for roughly $0.003 per detection in Nova API costs, zero for alerts.

The farmer is part of the model. The ✅/❌ Telegram feedback loop makes the farmer a contributor to the system, not just a consumer. Their ground-truth labels make every farm that joins PAWS better for every farm that comes after.

Built solo, end-to-end. GPU inference, transformer fine-tuning, 14-step pipeline, IoT integration, multi-model Nova orchestration, SvelteKit dashboard — one person, one hackathon.


What we learned

Amazon Nova's real power is the compression of intelligence. A problem that would normally require a vision model, a classification model, a translation service, a threat scoring algorithm, and a report generator resolves to a single API call with the right system prompt. That compression is what makes serious AI accessible to a farmer in rural Nepal at $0.003 per detection.

Fine-tuning RT-DETR for Asian wildlife taught me that the domain gap between "wildlife photography" and "farm perimeter camera at 2am" is enormous. A model trained on conservation images fails silently on real deployment conditions — the viewpoint, lighting, and occlusion patterns are completely different. Ground-truth data from actual farm cameras is irreplaceable.

The hardest engineering problem in a safety system is not accuracy — it is trust. A model that is 95% accurate but generates five false alarms in one night will be ignored by morning. The behavioral analysis, severity scoring, debounce layer, and farmer feedback loop all exist to earn and maintain trust. Technical excellence in service of human behavior is a different discipline from technical excellence alone.


What's next for PAWS — Perimeter AI Wildlife Surveillance

Region packs via RT-DETR fine-tuning. Asia Region Pack v1 (9 species) is trained. Africa Pack (elephant, lion, leopard, hyena, crocodile) and Americas Pack (jaguar, puma, bear, coyote) are next — trained on the labeled dataset growing with every farm deployment. Each regional model becomes a precision layer on top of YOLO-World's global coverage.

Edge-first inference. Port RT-DETR to NVIDIA Jetson or Raspberry Pi 5 for zero-latency local detection. The fine-tuned weights are already the right size for edge deployment.

Nova Sonic voice calls. Replace Polly with Nova Sonic's speech-to-speech for natural, conversational voice alerts — and eventually two-way farmer communication where Nova answers questions about the detected threat in the farmer's language.

Pattern intelligence. After three elephant incidents at the north fence in one week, Nova proactively advises: "Beehive fences show 80% effectiveness against elephant incursion in this corridor. Here is how to build one." Reactive protection becomes proactive farm management.

Conservation data layer. Anonymized detection data — species, GPS, time, behavior — shared with WWF and national wildlife authorities. Every farm running PAWS contributes to global wildlife movement mapping. The system that protects farmers also protects the animals by giving conservationists real-time corridor data they have never had before.

Hardware kit. A plug-and-play PAWS box: weatherproof Raspberry Pi, camera, ultrasonic emitter, strobe — pre-configured and field-ready. Target price under $40. Electric fences start at $800.

The goal was never a hackathon submission. The goal was Bishnu sleeping through the night.

Built With

  • amazon-bedrock
  • amazon-nova-2-lite
  • amazon-nova-embed
  • amazon-polly
  • fastapi
  • hls-streaming
  • modal.com
  • ntfy.sh
  • numpy
  • opencv
  • python
  • render
  • server-sent-events
  • sqlalchemy
  • sqlite
  • sveltekit
  • tailwind-css
  • telegram-bot-api
  • typescript
  • vercel
  • yolo-world
  • yt-dlp
Share this project:

Updates