Falantir: AI Agents for the Physical World
Inspiration
Small and medium businesses are under siege.
Retail theft and robberies have nearly doubled since 2019, costing billions globally, including £2.2 billion in the UK last year alone.
In the U.S., small retailers lose an average of $12k–$24k per year to theft, while tens of thousands of workplace-violence incidents are reported annually.
It's not just lost revenue - it's lost safety, staff morale, and community trust.
Traditional security cameras only record crimes. They don't prevent them.
We created Falantir to demonstrate the power of AI Agents for the physical world - autonomous systems that perceive, decide, and act in real-time to protect physical spaces. Every small business deserves an AI agent that actively guards their physical world.
What it does
Falantir is an autonomous AI agent that operates in physical spaces, transforming passive surveillance into an active protection system.
Unlike traditional AI that exists only in digital spaces, Falantir is an embodied AI agent that:
- Perceives the physical world through video feeds, using multi-modal AI (YOLO object detection + Gemini semantic understanding)
- Decides autonomously when threats are detected, analyzing context and severity in real-time
- Acts immediately by triggering alerts, notifying authorities, and documenting evidence
When an incident begins, the AI agent:
- Detects threats within seconds using parallel AI model processing
- Makes autonomous decisions about threat severity and appropriate responses
- Automatically executes actions: alerts staff, contacts authorities, captures evidence
- Operates continuously, monitoring physical spaces 24/7 without human intervention
This is AI for the physical world - an agent that doesn't just analyze data, but actively intervenes in physical reality to protect people and property.
How we built it
AI Agent Architecture:
- Perception Layer: Parallel processing with YOLO (object detection) and Gemini AI (semantic understanding) to create a comprehensive understanding of physical scenes
- Decision Engine: Autonomous threat assessment that evaluates context, severity, and appropriate response actions
- Action Layer: Direct integration with physical systems (EmailJS, Twilio) to execute real-world interventions
- Agent Memory: Hash-based caching system that allows the agent to learn from previous incidents and optimize responses
Backend (Python/Flask):
- Python 3.14 with Flask 3.0+ REST API serving as the agent's control system
- Flask-CORS 4.0+ for cross-origin resource sharing
- Ultralytics YOLOv8 (nano model) for real-time object detection with bounding boxes
- Google Gemini API (google-genai 0.2.0+) for semantic understanding and threat analysis
- OpenCV 4.8+ for video processing, frame extraction, and video encoding
- Parallel AI Processing: YOLO object detection and Gemini AI analysis run simultaneously using Python threading for faster perception
- Hash-based caching (MD5 file hashing) for agent memory and performance optimization, preventing redundant processing
- Twilio 8.0+ for SMS and phone call alerts to staff and authorities
- EmailJS integration for email notifications
- Gunicorn 21.2+ for production deployment and WSGI server
- python-dotenv for secure environment variable management
Frontend (React/TypeScript):
- React 18.3.1 with TypeScript for type-safe component development
- Vite 6.3.5 as the build tool and development server for fast hot module replacement
- Tailwind CSS for modern, responsive styling with custom terminal/hacker aesthetic
- Motion (Framer Motion) for smooth animations and transitions
- Radix UI components for accessible, customizable UI primitives (dialogs, tabs, accordions, etc.)
- Lucide React for iconography
- React Hook Form for form management in workflow builder
- Agent dashboard for monitoring AI agent activity with real-time updates
- Video upload and analysis interface with drag-and-drop support
- Custom workflow builder to define agent behaviors and decision rules
- Real-time auto-detection that shows the agent's continuous monitoring
- Settings dashboard for configuring agent actions and alert services
Technical Implementation Details:
- Video Processing: OpenCV handles video decoding, frame-by-frame processing, and re-encoding with YOLO bounding boxes
- Parallel Processing: Python's
threadingmodule enables simultaneous YOLO detection and Gemini API calls, achieving ~40% faster processing - Caching Strategy: MD5 hash-based cache prevents reprocessing identical videos, stored in JSON format for persistence
- API Architecture: RESTful Flask endpoints with proper error handling and timeout management
- State Management: React hooks (useState, useRef, useEffect) for complex real-time state across multiple workflows
- File Handling: FormData for multipart file uploads, base64 encoding for video transfer, Blob API for client-side video handling
Challenges we ran into
Real-Time Agent Response: Building an AI agent that can perceive and act fast enough to prevent crimes required parallel processing, which introduced thread management and timeout challenges
Physical-World Accuracy: Implementing an agent that accurately identifies threats in physical spaces without false positives - critical when the agent takes autonomous actions
Action Execution: Integrating third-party services for server-side usage required adapting APIs designed for browser environments, enabling the agent to act in the physical world
Agent Performance: Building a caching system that works with temporary file uploads to reduce processing time and costs, allowing the agent to operate efficiently
State Management: Handling complex React state for real-time agent monitoring across multiple workflows and physical locations
Accomplishments that we're proud of
Built an autonomous AI agent that detects threats and triggers alerts automatically, demonstrating true AI agents for the physical world
Achieved ~40% faster processing through parallel AI model execution, enabling real-time agent responses
Created an intuitive workflow system that allows business owners to configure agent behaviors without technical knowledge
Successfully integrated multiple AI models and alert services into a cohesive agent that operates autonomously in physical spaces
Developed a solution that proves AI agents can actively protect physical spaces, not just analyze digital data
What we learned
Physical-World AI Agents: Building AI systems that operate in physical spaces requires different architectures than digital-only AI - perception, decision, and action must happen in real-time
Autonomous Decision-Making: Creating agents that make decisions about physical-world interventions requires careful balance between autonomy and safety
Real-Time Systems: Building agents that need to respond quickly and reliably in physical spaces requires careful thread management and error handling
Embodied AI: Understanding how AI agents interact with physical systems (cameras, alerts, notifications) versus purely digital interactions
Agent Architecture: Designing systems where AI perceives, decides, and acts as a continuous loop in physical environments
What's next for Falantir
Live Agent Deployment: Real-time analysis of security camera feeds for continuous agent monitoring of physical spaces
Advanced Agent Perception: Specialized models for retail theft patterns, suspicious behavior, and workplace violence indicators - expanding the agent's understanding of physical threats
Multi-Agent Systems: Deploy multiple AI agents across different locations, coordinating responses through a central agent network
Physical-World Actions: Integration with physical security systems (locks, alarms, lighting) for direct agent intervention in physical spaces
Agent Learning: Continuous improvement through reinforcement learning, allowing agents to learn from incidents and optimize responses
Mobile Agent Interface: On-the-go monitoring and instant alerts, enabling business owners to interact with their physical-world AI agents from anywhere
Enhanced Agent Actions: Integration with local authorities and security services for faster coordinated responses between AI agents and human responders
Agent Analytics: Track agent performance, decision accuracy, and intervention effectiveness to continuously improve physical-world protection
Affordable Agent Deployment: Scale pricing specifically for small businesses to make enterprise-level AI agent protection accessible
Falantir represents a new category of AI: agents that don't just exist in digital spaces, but actively protect and intervene in the physical world.




Log in or sign up for Devpost to join the conversation.