1. Hardware Layer:

  • Arduino UNO: The low-level controller for the servo motors.
  • Servo Motors & 3D-Printed Hand: The physical actuation system.
  • Webcam: For visual input.

2. Software & Intelligence Layer:

  • Python/OpenCV: Handles the initial hand tracking and gesture recognition (cvzone library).
  • gpt-oss-20b Model (The Brain): This is the critical upgrade. The model is run locally on a machine powerful enough to handle the 20B parameters. It receives a structured prompt that includes:

    • The user's vocal intent.
    • The current raw gesture data from the camera.
    • A list of possible actuator commands and their meanings.
    • A command to reason about the best action and output only a machine-readable instruction.

    Example Prompt Engineering:

    system_prompt = """
    You are the control system for an intelligent prosthetic hand. Your task is to interpret the user's goal and translate it into precise commands for the hand's servos.
    
    USER INTENT: {user_intent}
    CURRENT GESTURE: {finger_data}
    
    Available Commands:
    - MIMIC: Directly use the gesture data. Output format: MIMIC,{finger_data}
    - GRIP_STRENGTH: Set servo power (0-180). Output format: GRIP,{thumb},{index},{middle},{ring},{pinky}
    - GENTLE_GRIP: A preset for delicate objects. Output: GENTLE_GRIP
    
    Reason step-by-step about the user's request. Consider the object they are interacting with (e.g., 'egg' vs 'hammer') and their stated desire ('without crushing it'). Output only the final command.
    """
    
  • Serial Communication: The Python script sends the final command from gpt-oss to the Arduino via the serial library.

3. Arduino Code: The microcontroller code was modified to parse not just simple 0s and 1s, but also the new, reasoned commands like GENTLE_GRIP which map to specific servo angles and delays for a soft close.

Challenges I Faced

  • Latency vs. Reasoning: The biggest challenge was balancing the real-time requirement of a prosthetic with the slower, reasoned output of the LLM. I mitigated this by using the fast CV mimicry as a default state and triggering the gpt-oss model only upon a specific vocal command, ensuring responsiveness.
  • Prompt Engineering for Reliability: Getting gpt-oss to consistently output a clean, machine-parsable command was crucial. It required iterative testing of the system prompt to constrain its responses and avoid creative but unusable text outputs.
  • Hardware Integration: Synchronizing the data flow between the webcam, the Python script running the model, and the Arduino required careful management of serial buffer overflows and command parsing on the microcontroller side.

What's Next for the Prometheus Hand

  • On-Device Optimization: Exploring quantization and distillation techniques to run a smaller, optimized version of gpt-oss on an edge device like a Jetson Nano, moving the entire system off a desktop PC.
  • Fine-Tuning (Most Useful Fine-Tune Category): The perfect next step is to fine-tune gpt-oss-20b on a dataset of everyday tasks and objects (e.g., "holding a key," "turning a page," "using a touchscreen") to make its reasoning even more precise and efficient for this specific domain.
  • Haptic Feedback: Integrating sensors in the fingertips to provide pressure data back to the model, creating a closed-loop system where the AI can adjust its grip in real-time based on tactile input.

This project demonstrates that the future of assistive technology isn't just about stronger materials or smaller motors—it's about embedding reasoning and understanding directly into the devices that help people live their lives. The OpenAI gpt-oss model made this leap from a simple tool to an intelligent agent possible.

Built With

Share this project:

Updates