Raju: Sassy and prodigal chess playing robot

Inspiration

We wanted to pay a homage to NYC's roots in publicly-played chess games and chess hustling. Initially, we envisioned a robot chess hustler, capable of beating players every time, dynamically trash-talking aloud, and physically moving real pieces. Raju's name was inspired by Gukesh Dommaraju, a chess GM who earned the title at 12 years old. Like Gukesh, Raju is also poised for greatness at a young age (having just learned to move pieces <24 hours ago).

What it does

Raju plays chess with an SO100 arm. All arm movements are controlled via a trained ML model that chooses moves based on the opponent's last move and the optimal response according to Stockfish. This player's experience is enhanced through trash-talking with quips that reflect how confident Raju is in the game's outcome.

How we built it

Raju's functionality is split into two recognizable "halves". The first half interprets the chessboard: a fine-tuned YOLO object detection model recognizes pieces to understand the board state, then determines the optimal next move using Stockfish. As the game progresses and the game odds shift, Raju will "trash talk" opponents by prompting Gemini for an appropriately confident quip, which is then outputted audibly with Fish Audio. From there, Raju prepares to use its second functionality: an ACT model trained to generally move a chess piece from a start square to a target square with the SO100. This process involves masking the input frames (and the video frames used for training) with an overlay that indicates where the piece should be picked and placed. In training, these overlays were set manually for >50 examples. In normal use, the overlays are generated to map targets above the chosen squares and are color-coded to represent pickup vs. placement.

Challenges we ran into

Building Raju came with a lot of challenges. Firstly, interpreting the chessboard state. We were unable to find an out-of-the-box model (or LLM) capable of properly understanding our pieces and board from the camera view we'd chosen for training. Initially, we built a system in OpenCV that identified differences between a pre-move and after-move image to determine the most likely move made by the opponent. This worked sometimes, but wasn't consistent enough for an entire game due to a lack of contrast between pieces on their respective colored squares. To fix this, we took a few dozen photos of various board states, labeled them, and trained a YOLO model to recognize the pieces. This solution was very accurate once integrated with an algorithm that accounted for the previous board state. The other largest challenge we encountered was getting Raju to transport pieces from one tile to another. Mechanically, the base SO100 gripper struggled to hold the rounded chess pieces consistently. Redesigned gripper mandibles with centering indents and rubber solved this issue. On the controls side, we knew Raju would struggle with identifying the target piece amongst 32 options on the board. After brainstorming, we decided to train a model with masked input frames that indicated the target tiles, leaving the transitions to be learned along with the pickup and placement routines. Initially, we trained a model to pick up pieces with twenty examples. This process took a few hours and was inconsistent due to gripper issues. Our secondary (final) training involved over double the data and a larger variety of movement situations. Disappointingly, after gathering examples, the training process only made it halfway in the time available, so Raju isn't as fast and precise as he should be.

Accomplishments that we're proud of

We're very proud of Raju, his abilities, and how much we've been able to learn during this project. None of us had ever interacted with SO100s, LeRobot, RunPod, online stockfish APIs, or FishAudio before this, yet we were able to bring them together to build something we find very entertaining. We took a big risk in training Raju on masked video frames, something without a precedent to guarantee success or a general model to tune from. Seeing Raju target pieces was a big payoff. Additionally, we're very happy with how our boar state identification turned out: we tested a lot of methods, and ended up building a custom solution that worked better than any alternative on the internet.

What we learned

We went from knowing nothing about controlling robots with ML to being fully capable of modifying, initializing, training, and using them for specific functions.

What's next for Raju: Sassy and prodigal chess-playing robot

En Passant and Castling! Raju has no trained examples for these moves, but we believe he should be capable of making them by doing the moves in series: moving one piece, then another, as if he received a "second turn" of sorts. This could be programmed realistically with hardcoded edge cases.