The human eye is the result of millions of years of evolution, finely tuned to perform one essential task with remarkable reliability: recognizing people in complex, ever-changing environments. With the advent of AI, deep learning, and computer vision, remarkable progress has been made to replicate these biological capabilities across many visual tasks. Yet one of the most deceptively challenging problems remains: can we reliably detect whether an image contains a person or not? With more than 8 billion people worldwide, countless backgrounds, diverse clothing styles, and highly dynamic environments, building a robust human-presence detector is far from trivial.

Inspired by the Unix design principle—“do one thing and do it well”—we created FirstEye, a model dedicated to a single, powerful capability: detecting whether a person is present in an image under real-world, in-the-wild conditions. Unlike controlled lab settings, real-world scenes introduce variable lighting, occlusion, motion blur, cluttered backgrounds, and unpredictable human poses. Most existing approaches struggle under these constraints or require cloud resources and large compute budgets. This motivated us to design FirstEye, a lightweight, specialized model built to perform this one task reliably on even the smallest edge devices.

FirstEye was driven by the need for reliable, privacy-preserving person detection on devices operating in real-world, unpredictable environments. Many modern vision models require large compute resources or continuous cloud connectivity, limiting their deployment on low-power embedded systems. We wanted a model that could run anywhere, from smartphones to ultra-low-power microcontrollers, enabling practical use cases such as safety monitoring, occupancy-driven automation, and intelligent on-device presence alerts—even when fully offline. Our model earned 1st place in the Edge AI Foundations Challenge – EDGE: Wake Vision, validating its performance and efficiency. (https://edgeai.modelnova.ai/challenges/details/challenge-edge:-wake-vision)

To achieve this, we designed and optimized an extremely compact neural architecture trained on 1.2 million in-the-wild images from WakeVision.ai. The resulting model uses an 80×80×3 input and requires only 73.5 kB RAM and 34.55 kB flash making it deployable across a wide range of ARM-based hardware, including Arduino Nano boards, Raspberry Pi devices, and Smart Phones. During development, we learned how to balance model efficiency with accuracy, apply hardware-aware optimizations, and maintain robustness under varying lighting, motion, and environmental conditions. One of the biggest challenges we faced was ensuring consistent performance in uncontrolled “wild” scenarios while staying within strict memory and compute budgets. Through iterative refinement of preprocessing pipelines, profiling techniques, and model compression strategies, we built a solution that demonstrates high-quality person detection can be performed fully on-device. FirstEye unlocks practical applications in edge safety systems, automation, and low-power IoT deployments—without ever relying on the cloud.

Built With

Share this project:

Updates