Orboh

🧠 Inspiration

Setting up physical robots—especially open-source ones like SO-ARM100—can be frustrating. Documentation is scattered, videos are outdated, and it's easy to make small mistakes that cause real damage. We imagined a world where an AI watches your setup process and instantly warns you if something's off. That idea led to Orboh: a real-time setup assistant for physical robots, powered by vision and AI.

🚀 What it does

Orboh helps users verify if their robot setup matches the correct procedure, using camera input and AI. It compares live video frames to predefined ground-truth states (from YAML or GitHub guides) and detects mistakes in real time. The system can notify users with spoken alerts, Slack messages, or a web interface—before anything goes wrong.

🛠️ How we built it

📱 We capture video of the robot setup using a smartphone camera.
🎞️ Using Python + OpenCV, we extract 1 frame per second from the live feed.
☁️ Each frame is uploaded to Amazon S3.
📘 A YAML-based system (so-arm-extractor) defines what the “correct” setup should look like.
👁️ Claude 3 Vision (via OpenRouter) compares the current frame with the ground truth.
🔊 Results will be sent to the user via Amazon Polly (voice), Slack, or Web UI.
🗂️ Eventually, all logs will be stored in Amazon DynamoDB.

At the time of submission, steps 1–3 are functional; the rest is under active integration.

⚠️ Challenges we ran into

Claude 3 Vision only supports still images, so we had to simulate real-time analysis by extracting frames at 1 FPS.
It was surprisingly hard to define “ground truth” states from inconsistent YouTube videos, blog posts, and GitHub repos.
The SO-ARM100 documentation was fragmented, making it unclear what exactly counted as a “correct” setup.

🏆 Accomplishments that we're proud of

Built a working pipeline that turns live smartphone video into structured data for AI processing.
Successfully extracted and uploaded frames to S3 in real time.
Designed a modular YAML-based framework (so-arm-extractor) for defining setup correctness.

📚 What we learned

Robotics needs human-centered setup tooling—especially for physical prototyping and education.
Combining vision AI (Claude) with robotics setup logic opens up new possibilities.
Designing prompts for AI to understand physical-world tasks is very different from typical NLP.

🔮 What's next for Orboh

Complete integration of Claude Vision for real-time setup comparison.
Add support for spoken alerts (Amazon Polly) and visual feedback (Slack / Web UI).
Allow multi-camera support and future sensor inputs (e.g., torque sensors).
Release a plug-and-play tool for open-source robot kits to minimize setup errors worldwide.

Built With

amazon
claude
github
obs
opencv
polly
python
s3
yaml

Updates

Sota Miyajima started this project — May 30, 2025 07:26 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.