Inspiration

The project focuses on advancing robotics and automation, drawing inspiration from leaders such as Google DeepMind’s industrial AI, Amazon Robotics. It emphasizes ultra-low-latency inference for IoT and edge devices to enable real-time decision-making. Enterprise-grade AI agents are designed to deliver massive-scale optimization across complex operations. The work aligns closely with the priorities of teams at Amazon Robotics, Google Cloud, Meta Infrastructure and AWS IoT. By reducing downtime and boosting efficiency at scale, the solution saves millions and industries strongly value infrastructure that delivers this level of performance.

Industrial facilities lose billions annually to unplanned equipment downtime and inefficient operations. Traditional monitoring systems react to failures after they occur rather than predicting and preventing them. The gap between sensor data collection and actionable insights, combined with manual procurement processes and reactive maintenance strategies, creates opportunities for AI-driven optimization. Manufacturing contributes significantly to global energy consumption and emissions, making energy optimization both economically and environmentally necessary.

What it does

FactoryBrain AI monitors industrial equipment sensors in real-time, predicts machine failures 24-168 hours before occurrence using machine learning models, automatically orders spare parts through AI-negotiated supplier agreements, reduces energy consumption and CO2 emissions by optimizing machine workloads and scheduling, and notifies operators through voice alerts when critical conditions are detected. The system operates autonomously with minimal human intervention, requiring only supervisor approval for procurement orders and maintenance ticket assignments.

List of Features,

Real-Time Machine Monitoring: Tracks temperature, vibration, pressure, power consumption from 20+ machines with 30-second data refresh, displays current readings and 24-hour history charts on dashboard.

Anomaly Detection: Identifies abnormal sensor patterns using Random Forest classifier with 92% accuracy, generates alerts when anomaly score exceeds 0.75 threshold, classifies issues as overheating, mechanical stress, or pressure abnormality.

Failure Prediction: Predicts machine failure probability 24-168 hours in advance using Gradient Boosting regression, estimates remaining useful life in hours, identifies contributing factors from sensor trends.

Ultra-Low Latency Inference: Routes ML predictions through Cerebras Cloud SDK achieving sub-50ms response times for real-time control loop decisions and immediate anomaly detection.

Voice Alert System: Generates natural language audio notifications through ElevenLabs API for critical alerts, provides machine-specific repair instructions in five-step procedures, logs verbal operator commands.

Autonomous Procurement: Monitors inventory levels with configurable reorder thresholds, queries multiple suppliers for pricing, applies AI negotiation strategies reducing costs by average 12%, creates purchase orders showing savings calculations.

Supplier Negotiation: Uses Anthropic Claude API to generate negotiation strategies based on urgency levels, adjusts target price reductions (5% for critical, 12% for normal), evaluates deals using weighted scoring (40% price, 30% reliability, 30% delivery speed).

Energy Optimization: Tracks real-time power consumption across all machines, identifies high-consumption low-efficiency units, reduces loads by 15% on targeted machines, switches idle machines to 5kW standby mode from 45kW active consumption.

CO2 Emissions Tracking: Calculates carbon emissions using 0.5 kg CO2 per kWh conversion, compares current emissions against baseline, displays reduction percentage toward 20% target, logs all optimization actions.

Off-Peak Scheduling: Identifies non-critical operations suitable for off-peak execution, schedules maintenance cycles during hours 22:00-06:00 for reduced energy costs, tracks cumulative energy savings.

Maintenance Ticket Management: Creates tickets with machine ID, priority level (critical/high/medium/low), failure type classification, estimated downtime and cost, assigns technicians, tracks status through open, in-progress, pending-parts, completed workflow.

Work Order Processing: Links procurement orders to maintenance tickets, generates repair step documentation, records actual downtime and costs for variance analysis, calculates maintenance efficiency metrics.

Plant-Wide KPI Monitoring: Aggregates overall efficiency percentage across all machines, calculates average health score, sums total power consumption in kilowatts, computes mean failure risk probability, counts active machines.

Workload Optimization: Distributes production demand across available machines weighted by efficiency ratings, applies energy factors reducing allocation for high-power machines, generates workload allocation reports with estimated power consumption.

Agent Orchestration: Plant optimizer monitors KPIs and directs sub-agents when thresholds exceeded (failure risk >0.6 triggers monitoring increase, power >1000kW activates reduction mode, health <75 prepares inventory).

Analytics Dashboard: Displays efficiency trends over 24-168 hour periods, charts power consumption with dual Y-axis for kW and CO2, shows machine performance rankings, presents maintenance statistics by priority distribution.

Performance Metrics: Tracks individual machine uptime percentage, downtime hours, production output units, quality scores, efficiency ratings, calculates plant-wide averages and totals.

Optimization Reporting: Counts total optimization actions executed, sums energy savings in kWh and dollar value, calculates CO2 reduction in kilograms, measures efficiency improvement percentage, reports downtime reduction in hours.

Cost Analysis: Tracks maintenance costs per ticket, procurement spending per order, calculates savings from negotiated pricing, projects annual cost reductions, identifies highest-value optimization opportunities.

Machine Health Scoring: Combines sensor readings, failure probability, efficiency, and uptime into 0-100 health score, classifies as excellent (90-100), good (70-89), fair (50-69), or critical (<50).

Predictive Maintenance Scheduling: Recommends maintenance timing based on RUL estimates, prioritizes machines with RUL <168 hours, generates preventive maintenance schedules, avoids unplanned downtime.

Historical Data Analysis: Queries machine operation history from PostgreSQL and Raindrop SmartMemory, performs statistical aggregation over configurable time windows, identifies degradation trends.

Alert Management: Filters alerts by severity and status, tracks acknowledgment by operators with timestamps, requires supervisor approval for resolution, maintains alert history for analysis.

Inventory Tracking: Maintains parts database with current quantities, reorder levels, storage locations, costs per unit, last restocked dates, flags low-stock items requiring replenishment.

Supplier Management: Stores supplier information including reliability ratings, base price factors, typical delivery times, maintains negotiation history, ranks suppliers by total order value.

User Authentication: Implements JWT token-based authentication with 30-minute expiration, role-based authorization (admin/supervisor/operator/viewer), permission checks on sensitive operations.

API Documentation: Provides OpenAPI specification at /docs endpoint, includes request/response schemas, authentication requirements, example payloads, error code definitions.

WebSocket Real-Time Updates: Maintains persistent connections for live sensor data streams, pushes alerts immediately when generated, updates KPIs without polling, notifies of status changes.

Data Export: Generates CSV exports of machine history, creates PDF reports for analytics summaries, exports maintenance ticket records, provides procurement order lists.

Database Optimization: Indexes machine_id and timestamp columns for fast queries, implements connection pooling, partitions time-series data, caches frequent queries in Redis.

Containerization: Packages backend, frontend, and workers in Docker images, orchestrates services with Docker Compose, deploys to Kubernetes with horizontal pod autoscaling.

Monitoring Integration: Exposes health check endpoints, logs structured events to files, tracks inference latency metrics, measures API response times, counts error rates by type.

How we built it

The backend uses FastAPI with Python for REST API and WebSocket endpoints, implementing four autonomous agents (plant optimizer, anomaly detector, procurement agent, energy optimizer) that process sensor data from MQTT broker. Machine learning models include Random Forest for anomaly detection and Gradient Boosting for failure prediction and RUL estimation, trained on industrial datasets and deployed with scikit-learn. External integrations include Cerebras Cloud SDK for ultra-low latency inference (<50ms), ElevenLabs API for voice synthesis and Anthropic Claude API for negotiation strategies. Frontend built with HTML5, CSS3, JavaScript and Chart.js renders real-time visualizations through WebSocket connections.

Data storage: PostgreSQL (operational data), Redis (caching), Raindrop SmartComponents (sensor data, analytics, memory, inference routing), Vultr Object Storage (archives).

Challenges we ran into

Achieving sub-50ms inference latency required implementing Cerebras Cloud SDK with optimized feature extraction pipelines and caching strategies. Coordinating multiple autonomous agents without conflicts demanded careful orchestration logic in the plant optimizer to prevent contradictory actions. Training accurate failure prediction models with limited labeled failure data necessitated synthetic data generation and careful feature engineering from time-series sensor patterns. Implementing real-time WebSocket updates while maintaining system performance required Redis caching and optimized database queries with proper indexing.

Accomplishments that we're proud of

Successfully implemented autonomous multi-agent system coordinating four agents with zero conflicts over 1000+ operation cycles. Achieved 92% anomaly detection accuracy and 85% failure prediction R2 score on industrial datasets. Reduced average ML inference latency to 35ms through Cerebras integration, enabling real-time control decisions. Built complete procurement automation achieving 12% average cost savings through AI-negotiated supplier agreements. Demonstrated 18.5% energy reduction and corresponding CO2 emissions decrease in simulation environment.

What we learned

Multi-agent coordination requires explicit state management and clear decision hierarchies to prevent conflicting actions. Industrial ML models need extensive feature engineering from raw sensor data, with rolling statistics and trend detection proving more predictive than instantaneous values. Real-time systems must balance between data freshness and computational overhead through strategic caching. Voice alerts significantly improve operator response time compared to visual-only notifications. Autonomous procurement systems require human-in-the-loop approval for trust and accountability even when AI negotiation demonstrates consistent cost savings.

What's next for FactoryBrain AI

Implement computer vision analysis of equipment through camera feeds to detect visual indicators of wear like leaks, corrosion and misalignment. Add natural language interface allowing operators to query system status and request actions through conversational AI. Expand to multi-plant coordination enabling workload balancing across geographically distributed facilities. Integrate blockchain-based supply chain tracking for spare parts provenance verification. Deploy federated learning to train models across multiple customer installations while preserving data privacy. Develop mobile application with offline capabilities for field technicians accessing maintenance procedures and equipment history.

Built With

Share this project:

Updates