RackMedic

Inspiration

Data centers are mission-critical infrastructure, but even a single overheating or failed server can trigger downtime, expensive repairs, and slow incident response. We were inspired by the Microsoft AI & Automation challenge to build a scaled autonomous system that can do what an on-site technician would: detect a fault, identify the correct rack slot, and physically remove the failing server board. That became RackMedic.

What it does

RackMedic is an autonomous server-rack maintenance prototype. It simulates a small data-center rack with multiple server blades, monitors thermal/sensor data from each slot, and triggers an intervention workflow when one blade overheats.

The system:

Ingests telemetry from sensor nodes (ESP32-based setup with temperature monitoring).
Uses AI logic to determine which server is in a fault state.
Commands a robotic arm to align with the target slot.
Grasps and pulls out the affected server tray (cardboard prototype blade) for maintenance.
Supports manual override/teleoperation for safety and debugging.

How we built it

We built RackMedic as a full-stack robotics + AI prototype with parallel work across hardware, software, and controls:

Hardware and mechanical:

Raspberry Pi 5 + AI HAT for edge compute/control.
Viam/servo-based robotic arm with custom motion scripts.
Custom rack-and-blade physical mockup for repeatable testing.
Ultrasonic and thermal/sensor inputs for alignment and fault simulation.

Robotics control software:

Low-level serial control for 6-DOF servos (custom packet protocol).
Servo ID discovery and calibration tooling.
Keyboard teleoperation mode for manual test control.
Forward-motion and return trajectories with leveling compensation.
Kinematics-assisted movement and fallback logic for reliable reach behavior.

AI and orchestration:

AI-assisted fault diagnosis logic for “which blade is failing.”
Event-driven action pipeline: detect fault -> select target -> command arm -> extract.
Real-time status logging for demos and judging walkthroughs.

Challenges we ran into

Hardware constraints and stock limitations forced design pivots.
Mechanical precision in a fast-built cardboard rack meant alignment tolerance was tight.
Servo calibration and mapping were nontrivial; minor offsets caused major end-effector errors.
Inverse kinematics did not always produce practical motion in real-world calibration, so we implemented deterministic fallback movement deltas.
Integrating sensing, AI decision logic, and actuation into a dependable autonomous loop under hackathon time pressure required aggressive debugging and simplification.

Accomplishments that we're proud of

Delivered an end-to-end autonomous maintenance demo, not just isolated subsystems.
Built robust robot-control utilities (ID sweep, teleop, trajectory scripts) that made rapid debugging possible.
Created a practical physical prototype environment aligned with sponsor challenge expectations.
Executed as a cross-functional team across embedded, mechanical, AI, and systems integration.

What we learned

In robotics hackathons, reliability beats complexity: a constrained, repeatable setup wins.
Good calibration tooling is as important as the core robot algorithm.
Simulation and kinematics are useful, but real hardware behavior always needs empirical fallback logic.
Early architecture decisions on interfaces between sensing, AI, and actuation dramatically reduce integration risk.
Fast team communication and clear task ownership are critical in a 36-hour build.

What's next for RackMedic

Replace coarse alignment with stronger vision-based rack-slot localization.
Add force-feedback/grasp validation for safer extraction and reduced jamming risk.
Improve autonomy with closed-loop correction during pull-out.
Build a cleaner operator dashboard with live telemetry, fault history, and manual override controls.
Move from cardboard rack prototype toward a more realistic modular testbed.
Expand from single-fault response to scheduling/queuing multiple incidents and predictive maintenance scoring.