🧠 Inspiration

Setting up physical robots—especially open-source ones like SO-ARM100—can be frustrating. Documentation is scattered, videos are outdated, and it's easy to make small mistakes that cause real damage. We imagined a world where an AI watches your setup process and instantly warns you if something's off. That idea led to Orboh: a real-time setup assistant for physical robots, powered by vision and AI.


🚀 What it does

Orboh helps users verify if their robot setup matches the correct procedure, using camera input and AI. It compares live video frames to predefined ground-truth states (from YAML or GitHub guides) and detects mistakes in real time. The system can notify users with spoken alerts, Slack messages, or a web interface—before anything goes wrong.


🛠️ How we built it

  1. 📱 We capture video of the robot setup using a smartphone camera.
  2. 🎞️ Using Python + OpenCV, we extract 1 frame per second from the live feed.
  3. ☁️ Each frame is uploaded to Amazon S3.
  4. 📘 A YAML-based system (so-arm-extractor) defines what the “correct” setup should look like.
  5. 👁️ Claude 3 Vision (via OpenRouter) compares the current frame with the ground truth.
  6. 🔊 Results will be sent to the user via Amazon Polly (voice), Slack, or Web UI.
  7. 🗂️ Eventually, all logs will be stored in Amazon DynamoDB.

At the time of submission, steps 1–3 are functional; the rest is under active integration.


⚠️ Challenges we ran into

  • Claude 3 Vision only supports still images, so we had to simulate real-time analysis by extracting frames at 1 FPS.
  • It was surprisingly hard to define “ground truth” states from inconsistent YouTube videos, blog posts, and GitHub repos.
  • The SO-ARM100 documentation was fragmented, making it unclear what exactly counted as a “correct” setup.

🏆 Accomplishments that we're proud of

  • Built a working pipeline that turns live smartphone video into structured data for AI processing.
  • Successfully extracted and uploaded frames to S3 in real time.
  • Designed a modular YAML-based framework (so-arm-extractor) for defining setup correctness.

📚 What we learned

  • Robotics needs human-centered setup tooling—especially for physical prototyping and education.
  • Combining vision AI (Claude) with robotics setup logic opens up new possibilities.
  • Designing prompts for AI to understand physical-world tasks is very different from typical NLP.

🔮 What's next for Orboh

  • Complete integration of Claude Vision for real-time setup comparison.
  • Add support for spoken alerts (Amazon Polly) and visual feedback (Slack / Web UI).
  • Allow multi-camera support and future sensor inputs (e.g., torque sensors).
  • Release a plug-and-play tool for open-source robot kits to minimize setup errors worldwide.

Built With

Share this project:

Updates