MyAcrobot - Physics-Driven AI Olympics

Front End Website On Mobile
Front End Website On Desktop
RL AI Training
Leaderboard
Round End
Agent's Brain

Welcome, Coach Ed Burch!

Your mission is to guide High Bar AI gymnast Trent Dimas to Olympic glory, just like the duo did in '92. As Trent executes a complete rotation, your coaching will determine the perfect moment to let go. Can the duo secure gold, silver, or bronze on the leaderboard? Let’s find out!

Inspiration

MyAcrobot is a physics-based simulation game inspired by the teamwork of U.S. Olympic gymnast Trent Dimas and Coach Ed Burch during their 1992 Olympic gold medal win. Before the event, Coach Ed Burch confidently predicted that "our high bar will stand out." His prediction proved accurate as Dimas scored an impressive 9.875, breaking the tie between gymnasts Grigory Misutin and Andreas Wecker, who both scored 9.837. This partnership between coach and athlete proved to be exactly what was needed to secure the gold medal.

What it does

MyAcrobot is a physics-based simulation game with a unique emphasis on human-AI collaboration. Unlike traditional games where players compete against AI, MyAcrobot fosters teamwork. Players work alongside Trent, the AI gymnast, to guide his movements by pushing, pulling, and aligning his pendulums. The goal is to help Trent achieve precise landings in the target quadrant goal zone by signaling the perfect moment for him to dismount, while maximizing the alignment count along the way.

Gameplay Overview

Core Gameplay

-The game is set on a 2D grid with x and y coordinates, where Trent (the AI gymnast) is represented as a pendulum, with his hands attached to a fixed point—the bar—at the center of the grid. -Trent's goal is to perform a full rotation over the bar -Players goal is to tell Trent when to dismount to land in the target goal zone within time until landing -countdown timer and give Trent the extra push to rotate over the bar and help align his torso and legs. -Players also have the option to disable the AI and take full control of Trent. By selecting "Switch Off AI" from the menu, the coacwh assumes complete control over Trent’s movements with the AI toggle switch

Scoring, Difficulty, and Customizing the Environment

Scoring System
Points are awarded for:
Landing at least one pendulum link in the target zone.
Earning alignment points for precise positioning.
Completing an over-the-bar motion (full rotation not required) for bonus points.
Personalization Options
Username: Choose a clever username to represent yourself or generate a random one using the open-source Chance.js library.
Difficulty Levels: Easy (1 goal), Medium (2 goals), Hard (3 goals), Expert (4 goals).
Environmental Customization: Players can tailor gameplay to their preferences and create unique challenges for themselves and the AI by adjusting:
Number of pendulum links
Air friction level
Pendulum length
Pendulum width

Leaderboard

After successfully landing within the required quadrant goal, players can compete for a spot on the leaderboard.

How AI and I built it

The front end, powered by Bootstrap 5, ensures seamless gameplay on both mobile and desktop, adapting to touch controls or mouse interactions based on the user's device. The gymnastics environment is built with Matter.js, delivering a physics-driven 2D grid that supports realistic, real-time interactions between the AI, the player, and the pendulum.

Players earn points by landing pendulum links in target zones after dismounting, aligning them during momentum buildup, and completing full rotations over the bar with Trent. Successful landings in the goal quadrant contribute to their score.

Customizable options—such as pendulum link properties, air friction, length, and width—enhance replayability and encourage experimentation. These mechanics foster collaboration between the AI and the player, enabling them to refine strategies, overcome challenges, and help Trent achieve full bar rotations, dismounts, and target landings.

Reinforcement learning AI was chosen for this project to handle the continuous and dynamic nature of a gymnastics high bar environment. The agent learns in real-time from six observations of pendulum positions and velocities, allowing it to determine its precise location on the x-y grid and make decisions for completing a full rotation.

The game builds on Gymnasium's Acrobot environment while mirroring the Matter.js front-end simulation to deliver realistic, synchronized physics. Both environments operate on 2D grids, ensuring seamless integration and consistency. The agent's neural network is a Deep Q-Network (DQN), which processes state data through computational layers to determine the best action for the current state. It’s all about deciding the smartest move based on what’s happening at the moment!

The reward system leverages precise mathematical computations to evaluate the gymnast's performance. A method calculates the gymnast's foot position (y-coordinate) using pendulum angles and link lengths. This calculation is critical for determining if the gymnast completes a full rotation over the bar and successfully lands in the target goal zones. The resulting y-coordinate serves as the basis for assigning rewards. Rewards are structured to guide the agent toward mastering advanced mechanics:

High Reward: Achieved by flipping above the bar (y > 0.0).
Moderate Reward: Granted for nearing the bar (-1.5 < y ≤ 0.0).
Penalty: Issued for remaining below the bar (y ≤ -1.5).

The agent's training process employs two neural networks: an online network and a target network. This dual-network architecture enhances stability by calculating temporal difference values (Q-values), which compare future rewards to current rewards. These Q-values are integral to the training process, feeding into the epsilon-greedy strategy to balance exploration and exploitation effectively.

An essential component of the agent's training process is the epsilon-greedy strategy, which balances exploration (trying new actions) and exploitation (choosing the best-known actions). To refine this balance over time, epsilon decay is implemented, gradually shifting the agent's behavior toward exploitation as it becomes more confident in its learned strategies.

The challenge lies in finding the perfect balance between the rate of decay and the level of risk-taking. This balance ensures the agent continues to explore enough to discover optimal strategies while leveraging its existing knowledge to maximize performance.

The Adam optimizer, used in PyTorch, efficiently adjusts weights and biases during training. Its implementation in PyTorch is particularly effective compared to other AI frameworks, ensuring training stability and improving model performance.

Challenges AI and I ran into

Developing a dynamic, physics-based RL model brought significant hurdles and integrating with the frontend and hosting on the web

Implementing a custom reward system, replay buffer, and fine-tuning a Deep Q-Network (DQN).
Understanding advanced concepts like geometry, non-linear equations, and reward systems.
Resolving the lack of clarity in Gymnasium documentation through extensive problem-solving.
Synchronizing the front-end with AI using WebSocket connections for real-time updates.
Ensuring cross-platform compatibility with responsive design for desktop and mobile devices.
Balancing AI independence with pauses for player coaching feedback.
Making a game work across multiple browsers and devices.
Knowing when to disable touch controls for mobile devices during screen overlays

What AI and I learned

This project reinforced the value of persistence, adaptability, and a growth mindset. Overcoming challenges often involved trial and error, leading to breakthroughs. Gaining deeper insights into reinforcement learning’s mathematical foundations was both rewarding and essential to building a capable AI not to mention math.

Closing

The journey was both technically and personally fulfilling. Amazon Q Developer simplified complex tasks, streamlined model adjustments, and added creative touches like confetti animations. AWS provided a robust infrastructure for hosting, scaling, and versioning, enabling me to turn an idea into a functional, public-facing application. Together, Amazon Q’s problem-solving capabilities and AWS’s scalability transformed MyAcrobot into a rewarding project of innovation and growth.

Built With

Updates

Tim Trueblood started this project — Jan 13, 2025 09:35 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.