Helping Hands

About the Project

Inspiration

We come from a robotics and embedded systems background, and most of our experience is in hardware design, motor control, sensing, and low-level software, not artificial intelligence. In electronics labs and maker spaces, one of the most persistent problems is the lack of a reliable extra set of hands—especially during soldering or precision assembly, where stability matters more than automation.

Traditional “helping hands” are static, passive tools. Our inspiration was to explore whether a robotic system could provide context-aware, hands-free assistance that adapts to what the user is doing, rather than forcing the user to adapt to the tool.

The goal was not to build an AI chatbot, but a physical workshop assistant that can see the workspace, listen to spoken requests, and act in the real world.

What it does

Helping Hands is an automated helping hands system for electronics workbenches. A user can issue natural language commands such as:

“Can you hold this for me?”
“Can you find me a Raspberry Pi in my workspace?”
“What part is this?”

The system combines:

Dual cameras to observe the workspace
Audio input to capture user intent
AI models to interpret requests and scene context
A motorized end-effector that physically assists the user

Based on the request, the system either:

Responds verbally with information (e.g., identifying a component or explaining its specifications), or
Moves a robotic helping hand to a computed position and holds an object steady.

This creates a closed-loop system where language, vision, and actuation work together.

How we built it

This project was built from the hardware up, which reflects our team’s strengths and experience.

On the hardware and robotics side:

We designed and implemented motor control for NEMA 17 stepper motors, enabling precise and repeatable positioning of the helping hands.
We integrated an IMU to stabilize motion and estimate end-effector state.
We achieved wireless communication, including Bluetooth access for configuration and control.
We built a dual-camera vision setup using ESP32-CAM modules, enabling geometric triangulation of objects in the workspace.

On the software side:

We designed a modular pipeline that treats AI as a perception and decision layer, not a replacement for robotics fundamentals.
Vision outputs are converted into metric workspace coordinates using classical camera calibration and geometric triangulation.
Motion planning and control are handled using deterministic logic rather than learned models.

AI systems (LiveKit, Overshoot, and Gemini) are used to interpret intent and context, while all geometry, control, and safety logic remains grounded in classical robotics.

Challenges we ran into

The most significant challenges were AI SDK integration, not robotics or hardware development.

Our team is not AI-focused, and we do not primarily work in machine learning or cloud AI ecosystems. As a result:

Integrating LiveKit AI for audio input and conversational control required navigating unfamiliar tooling and runtime constraints.
Integrating Overshoot AI for vision processing introduced challenges related to SDK compatibility, streaming assumptions, and environment setup.
Debugging AI pipelines was more difficult than debugging embedded systems due to reduced transparency and limited low-level control.

In contrast, motor control, sensing, wireless communication, and embedded integration aligned closely with our existing expertise.

Accomplishments that we're proud of

Designed and implemented precise motor control for an automated helping hands system using NEMA 17 stepper motors.
Achieved wireless communication and control of a micro-vehicle and actuator platform, demonstrating reliable remote command and coordination.
Successfully applied Overshoot AI for object detection of engineering materials, grounding AI vision results into physically meaningful workspace coordinates.
Integrated Bluetooth-based access for system configuration and control.
Built a dual ESP32-CAM vision system capable of supporting geometric triangulation.
Developed a modular architecture that cleanly separates perception, reasoning, and control, enabling safe and explainable physical interaction.
Delivered a system that performs real, physical interaction with the environment, not just a software or simulation demo.

What we learned

AI is most effective when used as a high-level reasoning and perception tool, not as a substitute for physics, geometry, or control theory.
Hardware-first design simplifies safety analysis and failure handling.
Integrating AI SDKs can be more challenging than implementing the underlying robotics system.
Clear interfaces and structured data are essential when combining AI with real-world actuation.

This project reinforced the importance of classical engineering fundamentals even in AI-assisted systems.