DeepEyes

Inspiration

DeepEyes was conceived from a pressing need to enhance traditional surveillance systems using state-of-the-art AI technology directly on edge devices—especially in safeguarding offices, apartments, and hotels—I aimed to create a solution that not only monitors in real time but also anticipates potential risks. My goal was to integrate advanced AI models that can detect unauthorized access, suspicious activity, crowds, guns, and fire/smoke, ensuring every inch of a building is secured with precision and efficiency.

What it does

DeepEyes is a Windows desktop application that provides real-time, on-device AI-powered security monitoring. It integrates with IP cameras and leverages multiple AI models to detect:

Unauthorized Access & Suspicious Activity: Quickly identify individuals or behaviors that pose a security risk.

Crowd Detection: Recognize unusual gatherings that might indicate emergencies or unauthorized events, while also providing valuable data for planning, capacity management, and occupancy analytics.

Gun Detection: Immediately alert security personnel upon detecting firearms in monitored areas.

Fire/Smoke Risks: Proactively detect early signs of fire or smoke, enabling prompt emergency responses.

Voice Command Interaction: Process speech commands and notifications via Whisper for seamless, hands-free control.

Interactive Visual Assistance: Utilize a dedicated chat model for a visual agent that provides real-time, context-aware guidance and alert details.

By running entirely on edge devices, DeepEyes ensures that data processing is local—minimizing latency, preserving privacy, and eliminating cloud dependencies.

How we built it

My development process centered on harnessing robust AI models from the Qualcomm AI Hub. The backbone of DeepEyes includes:

OpenAI-CLIP: For advanced image classification, similarity analysis, and efficient visual search capabilities.

Whisper: For real-time speech recognition, enabling voice command processing and audio alert integration.

YOLOv11 BYOM (Bring Your Own Model): For rapid, accurate object detection in live video streams.

Chat for Visual Agent: An interactive model that generates dynamic text-based insights and guidance, enhancing user interaction with the system.

I integrated these models into a cohesive Windows desktop application with Tauri. Emphasis was placed on a modular design and adherence to best coding practices, ensuring that each component works seamlessly together while achieving high-performance on-device processing.

Challenges we ran into

Real-Time Processing: Achieving rapid, on-device inference while simultaneously running multiple threat detection models required intensive optimization and resource management.

Resource Limitations: Edge devices have inherent constraints compared to cloud servers, making it crucial to balance model complexity with real-time performance.

Multi-Modal Integration: Coordinating outputs from object detection, speech recognition, and interactive chat models into a unified alert system proved to be complex.

Data Privacy: Ensuring all processing remains local to maintain robust data privacy without sacrificing performance.

Accomplishments that we're proud of

On-Device Mastery: Successfully deploying multiple advanced AI models on local devices, ensuring low latency and maximum data privacy.

Comprehensive Threat Detection: Integrating object detection, speech recognition, interactive chat, and image search into one unified, intelligent security platform.

User-Centric Design: Crafting an intuitive Windows desktop interface that allows security personnel to manage and monitor multiple camera feeds effortlessly, with real-time alerts and voice command capabilities.

What we learned

Interdisciplinary Collaboration: Merging expertise in computer vision, speech recognition, and natural language processing drove innovative solutions that exceeded traditional security measures.

User Feedback Drives Improvement: Early and continuous engagement with security professionals helped us refine features and optimize the overall usability of DeepEyes.

What's next for DeepEyes

Enhanced Analytics: We plan to integrate advanced analytics and reporting tools to provide deeper insights into security events and system performance.

Optimization is Crucial: Fine-tuning diverse AI models for real-time, on-device performance required a deep understanding of hardware limitations and algorithm efficiencies.

Broader Hardware Compatibility: Expanding support to a wider range of edge devices and platforms to increase adoption across diverse security environments.

Continuous AI Evolution: Keeping pace with the latest advancements from the Qualcomm AI Hub to further optimize threat detection, voice interaction, and user guidance capabilities.

Customization and Scalability: Introducing more customizable features tailored to different building types and scaling the platform for larger, multi-site deployments.

Built With

chroma-db
fastapi
llama-3
microsoft-camera
next-js
node-express-js
openai-clip
opencv
postgres-db
python
tauri
typescript
whisper
windows-surface-laptop-with-snapdragon-x-elite

Updates

Private user started this project — Feb 25, 2025 01:30 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.