What it does PPO Speedster is an AI agent trained to play the CarRacing-v3 environment from Gymnasium. It takes the raw 96x96 pixel image from the game as its only input and decides whether to steer, accelerate, or brake.

The final 2-million-step model can successfully navigate complex turns and achieve high scores (over 900 points) without any human rules. The submitted notebook automatically downloads this pre-trained model and runs it, allowing anyone to verify the agent's performance.

How we built it This project was built entirely in Python using a few key libraries:

Environment: Gymnasium (formerly Gym) provided the CarRacing-v3 game.

Algorithm: We used Stable-Baselines3 to implement the PPO (Proximal Policy Optimization) algorithm, a state-of-the-art method for continuous control.

Policy: The CnnPolicy was chosen so the agent could learn directly from the game's pixel "vision."

Training: The agent was trained for 2 million steps over 9 hours on a local PC (NVIDIA 3050 GPU) after initial attempts on Google Colab ran into time limits.

Submission: The final .zip model was uploaded to Google Drive. The submitted Google Colab notebook uses gdown to automatically download the trained model and moviepy to record a video of its performance, satisfying the competition's reproducibility rule.

Challenges we ran into The biggest challenge was the training process and the "Day 2" problems.

Environment Limits: Google Colab's free GPU limits and 12-hour disconnects caused the loss of our first models. This forced a switch to a more reliable (but slower) local PC training session.

Catastrophic Forgetting: Our 2M model was "jittery." When we tried to "fine-tune" it with a lower learning rate, the agent catastrophically forgot its driving skills and became much worse, teaching us a hard lesson about model stability.

Reproducibility: A major hurdle was meeting the "no manual intervention" rule for the submission. We solved this by creating a clean notebook that only downloads the final model from training in visual code using geforce rtx , allowing any judge to run it and see the result in minutes.

Accomplishments that we're proud of We are most proud of the final agent's driving skill. Seeing the model, which started by randomly crashing, learn to anticipate and navigate sharp turns was a huge success. The final model achieves a score of over 900 points, proving it learned a complex policy.

We're also proud of the final submission pipeline. Figuring out how to train locally but present a clean, runnable, and rule-compliant notebook on Colab was a great engineering challenge.

What we learned This project was a deep dive into the practical, often-frustrating side of RL.

Hyperparameters are critical: The "jittery" driving was a direct result of a low n_steps value (1024), which encouraged short-term, "twitchy" actions.

Training is an experiment: Our failed fine-tuning attempts taught us about "catastrophic forgetting," a key challenge in AI.

Environment is key: The trade-offs between a fast (but unreliable) cloud GPU and a slow (but stable) local PC are a major part of the MLOps challenge.

What's next for PPO Speedster The next step is to fix the instability. We will train a brand new agent from scratch using the "stable" hyperparameters (especially a much larger n_steps=4096) to create a smooth, perfect lap. After that, we want to apply this PPO framework to the other competition challenges, like Mario and Snake, to see how the agent's "vision" adapts to new games.

Built With

  • a
  • and
  • and-gymnasium-(pytorch-backend)-to-train-a-car-racing-agent-on-nvidia-gpus
  • carracing-v3
  • colab
  • drive
  • final
  • gdown
  • google
  • hosted
  • in
  • model
  • moviepy
  • on
  • presented
  • pygame
  • pytorch
  • stable-baselines3
  • the
  • this-project-uses-python
  • ufal.pybox2d
  • with
Share this project:

Updates