Inspiration

As a Robotics and Autonomous Systems student, I noticed a significant gap between high-level AI diagnostics and the physical execution of medical tasks. While many models can identify a tear in an MRI scan, very few systems translate that "insight" into a precise, physical robotic movement that assists a surgeon in real-time. I wanted to build a "system-of-systems" that treats the AI as the brain and the robot as the hand, connected by a human-in-the-loop voice interface.

How I Built It: The Multimodal Pipeline

The project is an end-to-end integration of Computer Vision, Natural Language Processing, and Robotic Kinematics:

  1. Deep Learning Core: I implemented a ResNet18 architecture to classify knee pathology into three states: Healthy, Partial Tear, or Complete Rupture.
  2. Explainable AI (XAI): To ensure clinical transparency, I used Grad-CAM to generate spatial heatmaps. This localizes the tear's center of mass $(x, y)$ within the 2D MRI slice.
  3. HRI & Voice Control: I integrated the Vosk library to interpret vocal commands such as "Wait," "Resume," or "Go for it". This acts as a safety interlock, ensuring the robot only moves upon human confirmation.
  4. Robotics Stack: Using ROS 2, I developed a node that publishes JointState messages to a Franka Emika Panda manipulator within NVIDIA Isaac Sim.

Challenges Faced

  1. Sim-to-Real Synchronization: Coordinating the ROS 2 bridge with Isaac Sim required precise timing to ensure the robot's joint trajectories matched the intended viewing angles without latency.
  2. Interpretability vs. Automation: Balancing full autonomy with human safety was a hurdle. I had to design a robust state machine to handle "Wait" and "Resume" commands so the doctor maintains 100% authority over the hardware.
  3. Voice Homophones: Dealing with similar-sounding words (e.g., "right" vs. "write") required a custom command interpreter to ensure the robot didn't move to the wrong anatomical region.

What I Learned

This project reinforced the importance of Explainable AI in high-stakes environments. I learned that in medical robotics, a correct diagnosis is only half the battle the other half is building trust through transparency (Grad-CAM) and reliable control (ROS 2). I also gained deeper experience in Joint-Space manipulation and multimodal data fusion, skills I look forward to applying in my upcoming role as an ML Engineer.

What's next for KneeSight: XAI-Driven Diagnostic Robotics

Looking ahead, the next phase of this project involves transitioning from the NVIDIA Isaac Sim environment to physical hardware deployment on a Franka Emika Panda research arm. To enhance the system's clinical utility, I plan to upgrade the current ResNet18 2D slice analysis to a 3D volumetric model such as VideoMAE or a Transformer-based architecture capable of processing entire MRI volumes simultaneously. I also aim to refine the Human-Robot Interaction (HRI) by integrating YOLOv11 or MediaPipe for hand gesture tracking, creating a robust hybrid control scheme that combines voice and physical cues. Furthermore, as I approach my graduation in May 2026, I intend to implement haptic feedback and real-time XAI updates to ensure the robotic camera provides continuous, justified visualization for surgeons in high-pressure operating environments.

Built With

Share this project:

Updates