Falantir: AI Agents for the Physical World

Inspiration

Small and medium businesses are under siege.

Retail theft and robberies have nearly doubled since 2019, costing billions globally, including £2.2 billion in the UK last year alone.

In the U.S., small retailers lose an average of $12k–$24k per year to theft, while tens of thousands of workplace-violence incidents are reported annually.

It's not just lost revenue - it's lost safety, staff morale, and community trust.

Traditional security cameras only record crimes. They don't prevent them.

We created Falantir to demonstrate the power of AI Agents for the physical world - autonomous systems that perceive, decide, and act in real-time to protect physical spaces. Every small business deserves an AI agent that actively guards their physical world.

What it does

Falantir is an autonomous AI agent that operates in physical spaces, transforming passive surveillance into an active protection system.

Unlike traditional AI that exists only in digital spaces, Falantir is an embodied AI agent that:

Perceives the physical world through video feeds, using multi-modal AI (YOLO object detection + Gemini semantic understanding)
Decides autonomously when threats are detected, analyzing context and severity in real-time
Acts immediately by triggering alerts, notifying authorities, and documenting evidence

When an incident begins, the AI agent:

Detects threats within seconds using parallel AI model processing
Makes autonomous decisions about threat severity and appropriate responses
Automatically executes actions: alerts staff, contacts authorities, captures evidence
Operates continuously, monitoring physical spaces 24/7 without human intervention

This is AI for the physical world - an agent that doesn't just analyze data, but actively intervenes in physical reality to protect people and property.

How we built it

AI Agent Architecture:

Perception Layer: Parallel processing with YOLO (object detection) and Gemini AI (semantic understanding) to create a comprehensive understanding of physical scenes
Decision Engine: Autonomous threat assessment that evaluates context, severity, and appropriate response actions
Action Layer: Direct integration with physical systems (EmailJS, Twilio) to execute real-world interventions
Agent Memory: Hash-based caching system that allows the agent to learn from previous incidents and optimize responses

Backend (Python/Flask):

Python 3.14 with Flask 3.0+ REST API serving as the agent's control system
Flask-CORS 4.0+ for cross-origin resource sharing
Ultralytics YOLOv8 (nano model) for real-time object detection with bounding boxes
Google Gemini API (google-genai 0.2.0+) for semantic understanding and threat analysis
OpenCV 4.8+ for video processing, frame extraction, and video encoding
Parallel AI Processing: YOLO object detection and Gemini AI analysis run simultaneously using Python threading for faster perception
Hash-based caching (MD5 file hashing) for agent memory and performance optimization, preventing redundant processing
Twilio 8.0+ for SMS and phone call alerts to staff and authorities
EmailJS integration for email notifications
Gunicorn 21.2+ for production deployment and WSGI server
python-dotenv for secure environment variable management

Frontend (React/TypeScript):

React 18.3.1 with TypeScript for type-safe component development
Vite 6.3.5 as the build tool and development server for fast hot module replacement
Tailwind CSS for modern, responsive styling with custom terminal/hacker aesthetic
Motion (Framer Motion) for smooth animations and transitions
Radix UI components for accessible, customizable UI primitives (dialogs, tabs, accordions, etc.)
Lucide React for iconography
React Hook Form for form management in workflow builder
Agent dashboard for monitoring AI agent activity with real-time updates
Video upload and analysis interface with drag-and-drop support
Custom workflow builder to define agent behaviors and decision rules
Real-time auto-detection that shows the agent's continuous monitoring
Settings dashboard for configuring agent actions and alert services

Technical Implementation Details:

Video Processing: OpenCV handles video decoding, frame-by-frame processing, and re-encoding with YOLO bounding boxes
Parallel Processing: Python's threading module enables simultaneous YOLO detection and Gemini API calls, achieving ~40% faster processing
Caching Strategy: MD5 hash-based cache prevents reprocessing identical videos, stored in JSON format for persistence
API Architecture: RESTful Flask endpoints with proper error handling and timeout management
State Management: React hooks (useState, useRef, useEffect) for complex real-time state across multiple workflows
File Handling: FormData for multipart file uploads, base64 encoding for video transfer, Blob API for client-side video handling

Challenges we ran into

Real-Time Agent Response: Building an AI agent that can perceive and act fast enough to prevent crimes required parallel processing, which introduced thread management and timeout challenges
Physical-World Accuracy: Implementing an agent that accurately identifies threats in physical spaces without false positives - critical when the agent takes autonomous actions
Action Execution: Integrating third-party services for server-side usage required adapting APIs designed for browser environments, enabling the agent to act in the physical world
Agent Performance: Building a caching system that works with temporary file uploads to reduce processing time and costs, allowing the agent to operate efficiently
State Management: Handling complex React state for real-time agent monitoring across multiple workflows and physical locations

Accomplishments that we're proud of

Built an autonomous AI agent that detects threats and triggers alerts automatically, demonstrating true AI agents for the physical world
Achieved ~40% faster processing through parallel AI model execution, enabling real-time agent responses
Created an intuitive workflow system that allows business owners to configure agent behaviors without technical knowledge
Successfully integrated multiple AI models and alert services into a cohesive agent that operates autonomously in physical spaces
Developed a solution that proves AI agents can actively protect physical spaces, not just analyze digital data

What we learned

Physical-World AI Agents: Building AI systems that operate in physical spaces requires different architectures than digital-only AI - perception, decision, and action must happen in real-time
Autonomous Decision-Making: Creating agents that make decisions about physical-world interventions requires careful balance between autonomy and safety
Real-Time Systems: Building agents that need to respond quickly and reliably in physical spaces requires careful thread management and error handling
Embodied AI: Understanding how AI agents interact with physical systems (cameras, alerts, notifications) versus purely digital interactions
Agent Architecture: Designing systems where AI perceives, decides, and acts as a continuous loop in physical environments

What's next for Falantir

Live Agent Deployment: Real-time analysis of security camera feeds for continuous agent monitoring of physical spaces
Advanced Agent Perception: Specialized models for retail theft patterns, suspicious behavior, and workplace violence indicators - expanding the agent's understanding of physical threats
Multi-Agent Systems: Deploy multiple AI agents across different locations, coordinating responses through a central agent network
Physical-World Actions: Integration with physical security systems (locks, alarms, lighting) for direct agent intervention in physical spaces
Agent Learning: Continuous improvement through reinforcement learning, allowing agents to learn from incidents and optimize responses
Mobile Agent Interface: On-the-go monitoring and instant alerts, enabling business owners to interact with their physical-world AI agents from anywhere
Enhanced Agent Actions: Integration with local authorities and security services for faster coordinated responses between AI agents and human responders
Agent Analytics: Track agent performance, decision accuracy, and intervention effectiveness to continuously improve physical-world protection
Affordable Agent Deployment: Scale pricing specifically for small businesses to make enterprise-level AI agent protection accessible

Falantir represents a new category of AI: agents that don't just exist in digital spaces, but actively protect and intervene in the physical world.