Inspiration
We wanted to build an AI that doesn't wait for you to open an app. It just watches and remembers. The kind of assistant that knows where you left your phone, notices when you've been working for three hours straight, and nudges you to take a break. The name comes from the little lizards that hang out on walls, quietly observing everything.
How We Built It
The biggest challenge was keeping the camera feed smooth while running expensive object detection. Our solution was a dual-process architecture: a workstation with an RTX 5080 runs YOLO-World and Depth Anything V2, publishing annotations over the network, while a laptop handles the display and API. We ran inference at low resolution for speed, then mapped bounding boxes back to original resolution so users see a crisp, full-res feed. Every 30 seconds, we feed Sonnet 4.5 a frame enriched with our perception pipeline's accumulated outputs, such as bounding boxes, labels, and depth estimates. A VLM single-shotting a raw image can't do this well. The curated context is what makes its descriptions meaningful. When you ask Gekko a question, it reasons over your recent history. No state machines, just letting the model generalize.
Log in or sign up for Devpost to join the conversation.