Inspiration
Data centers are mission-critical infrastructure, but even a single overheating or failed server can trigger downtime, expensive repairs, and slow incident response. We were inspired by the Microsoft AI & Automation challenge to build a scaled autonomous system that can do what an on-site technician would: detect a fault, identify the correct rack slot, and physically remove the failing server board. That became RackMedic.
What it does
RackMedic is an autonomous server-rack maintenance prototype. It simulates a small data-center rack with multiple server blades, monitors thermal/sensor data from each slot, and triggers an intervention workflow when one blade overheats.
The system:
- Ingests telemetry from sensor nodes (ESP32-based setup with temperature monitoring).
- Uses AI logic to determine which server is in a fault state.
- Commands a robotic arm to align with the target slot.
- Grasps and pulls out the affected server tray (cardboard prototype blade) for maintenance.
- Supports manual override/teleoperation for safety and debugging.
How we built it
We built RackMedic as a full-stack robotics + AI prototype with parallel work across hardware, software, and controls:
Hardware and mechanical:
- Raspberry Pi 5 + AI HAT for edge compute/control.
- Viam/servo-based robotic arm with custom motion scripts.
- Custom rack-and-blade physical mockup for repeatable testing.
- Ultrasonic and thermal/sensor inputs for alignment and fault simulation.
Robotics control software:
- Low-level serial control for 6-DOF servos (custom packet protocol).
- Servo ID discovery and calibration tooling.
- Keyboard teleoperation mode for manual test control.
- Forward-motion and return trajectories with leveling compensation.
- Kinematics-assisted movement and fallback logic for reliable reach behavior.
AI and orchestration:
- AI-assisted fault diagnosis logic for “which blade is failing.”
- Event-driven action pipeline: detect fault -> select target -> command arm -> extract.
- Real-time status logging for demos and judging walkthroughs.
Challenges we ran into
- Hardware constraints and stock limitations forced design pivots.
- Mechanical precision in a fast-built cardboard rack meant alignment tolerance was tight.
- Servo calibration and mapping were nontrivial; minor offsets caused major end-effector errors.
- Inverse kinematics did not always produce practical motion in real-world calibration, so we implemented deterministic fallback movement deltas.
- Integrating sensing, AI decision logic, and actuation into a dependable autonomous loop under hackathon time pressure required aggressive debugging and simplification.
Accomplishments that we're proud of
- Delivered an end-to-end autonomous maintenance demo, not just isolated subsystems.
- Built robust robot-control utilities (ID sweep, teleop, trajectory scripts) that made rapid debugging possible.
- Created a practical physical prototype environment aligned with sponsor challenge expectations.
- Executed as a cross-functional team across embedded, mechanical, AI, and systems integration.
What we learned
- In robotics hackathons, reliability beats complexity: a constrained, repeatable setup wins.
- Good calibration tooling is as important as the core robot algorithm.
- Simulation and kinematics are useful, but real hardware behavior always needs empirical fallback logic.
- Early architecture decisions on interfaces between sensing, AI, and actuation dramatically reduce integration risk.
- Fast team communication and clear task ownership are critical in a 36-hour build.
What's next for RackMedic
- Replace coarse alignment with stronger vision-based rack-slot localization.
- Add force-feedback/grasp validation for safer extraction and reduced jamming risk.
- Improve autonomy with closed-loop correction during pull-out.
- Build a cleaner operator dashboard with live telemetry, fault history, and manual override controls.
- Move from cardboard rack prototype toward a more realistic modular testbed.
- Expand from single-fault response to scheduling/queuing multiple incidents and predictive maintenance scoring.
Built With
- elevenlabs
- esp32
- gemini
- python
- raspberry-pi
- streamlit
- viam
Log in or sign up for Devpost to join the conversation.