Guardian-Bot: Edge-AI Autonomous Inspection Robot
Inspiration
The inspiration for Guardian-Bot came from the realization that industrial safety often relies on high-stakes manual inspections. In many factories, maintenance crews walk miles every shift to read analog gauges, check safety tags, and ensure compliance. A single misread of a pressure gauge or an overlooked "Out of Service" tag can lead to catastrophic equipment failure or workplace injury.
We wanted to build a solution that doesn't just "see" but actually understands the industrial environment. By combining the precision of PaddleOCR-VL with the reasoning power of ERNIE 4.5, we envisioned a robot that could serve as a proactive guardian, operating entirely on the edge to ensure zero-latency safety monitoring.
What it does
Guardian-Bot is an autonomous inspection platform powered by the D-Robotics RDK X5. It navigates industrial floors and performs three key functions:
- Visual Perception: Uses PaddleOCR-VL to extract text, table data, and gauge readings from the environment (e.g., PSI levels on a boiler or dates on a fire extinguisher tag).
- Contextual Reasoning: A fine-tuned ERNIE 4.5 model analyzes the extracted data against safety protocols. For example, if it reads $P > 85 \text{ PSI}$ on a tank labeled with a $80 \text{ PSI}$ limit, it recognizes the hazard.
- Real-time Edge Alerting: Because the processing happens locally on the RDK X5, the robot can instantly trigger alarms or stop machinery via industrial protocols like CAN FD if a critical safety violation is detected.
How we built it
The project was built on a multi-layered stack:
- Hardware: The D-Robotics RDK X5 served as the brain, utilizing its 10 TOPS NPU to handle intensive AI workloads.
- Perception (PaddleOCR-VL): We deployed the PaddleOCR-VL-0.9B model. To fit it on the edge, we used Paddle Lite for INT8 quantization, significantly reducing memory footprint while maintaining high accuracy for handwritten and printed text.
- Intelligence (ERNIE 4.5): We used Unsloth to fine-tune the ERNIE 4.5 Open-Source (A3B) model on a specialized dataset of industrial safety manuals and "Lockout-Tagout" (LOTO) procedures. This allows the model to understand the significance of the text it reads.
- Integration: The system was integrated using the CAMEL-AI framework, where a "Perception Agent" (OCR) passes structured Markdown data to a "Safety Inspector Agent" (ERNIE) for final decision-making.
Challenges we ran into
The primary challenge was computational constraints. Running a Vision-Language Model (VLM) and an LLM simultaneously on an edge device required extreme optimization.
- Latency: Initially, the inference time for a full reasoning loop was too slow for a moving robot. We solved this by implementing Model Distillation, where the larger ERNIE 4.5 guided the training of a smaller "Expert" model for the specific safety domain.
- Environmental Noise: Industrial lighting is often inconsistent. We had to augment our PaddleOCR-VL training data with low-light and high-glare samples to ensure the robot could read reflective metal nameplates and dim digital screens.
Accomplishments that we're proud of
- Edge Autonomy: We successfully achieved a reasoning loop that occurs entirely on-device, meaning the robot remains functional even in "dead zones" without Wi-Fi.
- High Precision: Our fine-tuned PaddleOCR-VL achieved a Character Error Rate (CER) of less than $2\%$ on specialized industrial fonts and handwritten maintenance logs.
- Zero-Shot Safety Reasoning: The ERNIE 4.5 backbone allows the robot to handle safety signs it has never seen before by reasoning through the semantic meaning of the words (e.g., understanding that "Corrosive" and "Acid" require similar safety distances).
What we learned
We learned that optimization is as important as architecture. Using tools like Unsloth and LLaMA-Factory showed us that we can bring "heavyweight" intelligence to "lightweight" hardware. We also gained deep experience in the PaddlePaddle ecosystem, specifically how Paddle Lite bridges the gap between high-level research models and real-world deployment on ARM-based robotics.
What's next for Edge-AI Autonomous Inspection Robot
The next phase for Guardian-Bot is Collaborative Multi-Agent Inspections. We plan to use the CAMEL-AI framework to allow a fleet of robots to coordinate. For instance, if one robot detects a leak, it can signal a second robot to navigate to the nearest shut-off valve, while the ERNIE-powered brain drafts a maintenance ticket in real-time. We also aim to implement 3D Visual Grounding, allowing the robot to point specifically to the part of a machine that requires attention.
Log in or sign up for Devpost to join the conversation.