Inspiration

Interested in computer vision, and saw viability in disability assitance. Wanted to implement features to improve OM1, as described in the feature: https://github.com/OpenmindAGI/OM1/issues/180.

What it does

The turtlebot takes the camera feed from the laptop and sends the video feed to Vila VLM. It turns it into text to describe the hand gesture, and then moves based on the mapped gesture to correct movement.

How I built it

Researched the OM1 codebase, and altered a json5 config file, using extensive prompt engineering to get accurate results.

Challenges I ran into

  • It was difficult to connect to the robot at first, and there were some missing packages that were hard to find in the setup
  • Robot was not fully charged due to shared use
  • Robot was inconsistent with getting input data from camera ## Accomplishments that I'm proud of
  • Robot can accurately recognize hand gestures, and moves the correct direction based on them ## What I learned
  • Learned about robot software tools such as Zeno
  • Learned how to navigate and implement features in large legacy codebases ## What's next for Untitled
  • Hand gestures recognition even with noise in the video, (ie. multiple people in the same video doing different things)
  • Full body gesture recognition

Built With

Share this project:

Updates