🧠 Inspiration
Setting up physical robots—especially open-source ones like SO-ARM100—can be frustrating. Documentation is scattered, videos are outdated, and it's easy to make small mistakes that cause real damage. We imagined a world where an AI watches your setup process and instantly warns you if something's off. That idea led to Orboh: a real-time setup assistant for physical robots, powered by vision and AI.
🚀 What it does
Orboh helps users verify if their robot setup matches the correct procedure, using camera input and AI. It compares live video frames to predefined ground-truth states (from YAML or GitHub guides) and detects mistakes in real time. The system can notify users with spoken alerts, Slack messages, or a web interface—before anything goes wrong.
🛠️ How we built it
- 📱 We capture video of the robot setup using a smartphone camera.
- 🎞️ Using Python + OpenCV, we extract 1 frame per second from the live feed.
- ☁️ Each frame is uploaded to Amazon S3.
- 📘 A YAML-based system (
so-arm-extractor) defines what the “correct” setup should look like. - 👁️ Claude 3 Vision (via OpenRouter) compares the current frame with the ground truth.
- 🔊 Results will be sent to the user via Amazon Polly (voice), Slack, or Web UI.
- 🗂️ Eventually, all logs will be stored in Amazon DynamoDB.
At the time of submission, steps 1–3 are functional; the rest is under active integration.
⚠️ Challenges we ran into
- Claude 3 Vision only supports still images, so we had to simulate real-time analysis by extracting frames at 1 FPS.
- It was surprisingly hard to define “ground truth” states from inconsistent YouTube videos, blog posts, and GitHub repos.
- The SO-ARM100 documentation was fragmented, making it unclear what exactly counted as a “correct” setup.
🏆 Accomplishments that we're proud of
- Built a working pipeline that turns live smartphone video into structured data for AI processing.
- Successfully extracted and uploaded frames to S3 in real time.
- Designed a modular YAML-based framework (
so-arm-extractor) for defining setup correctness.
📚 What we learned
- Robotics needs human-centered setup tooling—especially for physical prototyping and education.
- Combining vision AI (Claude) with robotics setup logic opens up new possibilities.
- Designing prompts for AI to understand physical-world tasks is very different from typical NLP.
🔮 What's next for Orboh
- Complete integration of Claude Vision for real-time setup comparison.
- Add support for spoken alerts (Amazon Polly) and visual feedback (Slack / Web UI).
- Allow multi-camera support and future sensor inputs (e.g., torque sensors).
- Release a plug-and-play tool for open-source robot kits to minimize setup errors worldwide.
Log in or sign up for Devpost to join the conversation.