Inspiration

The climate crisis demands revolutionary solutions. Fusion energy could provide unlimited, clean power from seawater, with zero carbon emissions and no long-lived radioactive waste. But there is a catch: the biggest barrier to commercial fusion isn't building the reactor, it's controlling 100-million-degree plasma that can collapse in milliseconds. Traditional control systems struggle with plasma's chaotic, nonlinear behavior. We realized that reinforcement learning, could be the key to mastering plasma dynamics. This project became our mission: use AI to unlock fusion energy.

We found a paper that introduces Gym-TORAX, a new Python package built on top of Google DeepMind’s TORAX simulator, a high-fidelity, differentiable tokamak physics engine. Gym-TORAX exposes a standard Gymnasium interface for reinforcement learning, allowing agents to interact with a controlled plasma physics environment.

Using this framework, we were able to design our own hybrid control environment with custom safety guards and physics-derived objectives, enabling our RL agent to learn plasma stabilization strategies that mirror the challenges of real-world tokamak operation, without ever risking an actual reactor.

What it does

  • Monitors critical plasma parameters in real-time: plasma shape \((\beta_N,\, q_{\min},\, q_{95})\), vertical position (z_cm), and drift velocity (dZ/dt)

  • Predicts and prevents disruptions: The RL agent learns to detect early warning signs of instability and automatically adjusts magnetic coil currents to stabilize the plasma before disasters occur.

  • Self-corrects violations: When parameters drift out of safe bounds, the system applies corrective actions within milliseconds, faster than human operators could respond

  • Quantifies impact: Built-in calculators demonstrate $50M+ annual savings per tokamak from avoided disruptions.

How we built it

1. Environment Setup: gym-TORAX

We used gym-TORAX, a physics-accurate tokamak simulation built on Google DeepMind's TORAX solver. This environment models:

  • Plasma current evolution
  • Temperature and density profiles
  • Magnetic equilibrium
  • Fusion power output

The environment provides 50+ dimensional observation spaces and continuous action spaces for controlling poloidal field coils, electron cyclotron heating (ECRH), and neutral beam injection.

2. RL Agent Selection: Soft Actor Critic

Tokamak control is continuous-action, noisy, and safety-critical. Most RL agents aim to achieve a deterministic policy, but in chaotic continuous systems like plasma physics, being too deterministic too early can trap the agent in suboptimal behavior.

Soft Actor-Critic (SAC) introduces the following entropy term into its objective: \(\alpha\, H\left(\pi(\cdot|s_t)\right)\)

This means the policy is rewarded not only for expected return rₜ but also for maintaining sufficient randomness (high entropy).
The coefficient α controls the exploration–exploitation trade-off: larger α encourages exploration, smaller α emphasizes exploitation.

3. Safety Monitoring Systems

We implemented two critical safety monitoring systems as custom Gymnasium wrappers:

Shape Guard (shape_guard.py): Monitors plasma shape parameters \((\beta_N,\, q_{\min},\, q_{95})\) and applies adaptive reward shaping:

  • Safe state bonus: +3x when all parameters are within bounds
  • Self-fixing bonus: +2x when agent corrects from violations
  • Violation penalty: -0.1 to -0.4x proportional to severity

Vertical Guard (vertical_guard.py): Tracks vertical plasma displacement to prevent VDEs (Vertical Displacement Events). Since z_cm wasn't directly available in the state space, we computed it from plasma geometry:

\(\text{cm} \approx a_{\text{minor}} \cdot \left(\frac{\delta_{\text{upper}} - \delta_{\text{lower}}}{2}\right) \cdot 10\)

Created iter_hybrid_shape_guard_env.py that integrates both guards into a unified reward function, teaching the agent to proactively avoid violations rather than reactively recovering from them.

4. Interactive Visualizations

Built multiple real-time visualization tools:

  • 3D Tokamak Chamber (matplotlib): Rotating view with interactive coil controls and live vertical position tracking
  • Plasma Shape Tracker (matplotlib): Real-time particle trails with color-coded safety status
  • Three.js WebSocket Visualization: 3D swirling particle effects with real-time data streaming
  • Next.js Dashboard: Production-ready web app with calculators, animations, and Python integration

Challenges we ran into

1. The Missing z_cm Problem

The vertical position wasn't directly available in gym-TORAX observations. We spent hours searching through nested dictionaries of 50+ state variables, reading TORAX source code, and reverse-engineering the physics. Eventually, we derived our own computation from triangularity and minor radius.

2. Multi-Objective Optimization

The agent faces competing objectives: maximize fusion power (high β_N, high current) vs. maximize safety (low disruption risk). Finding the Pareto frontier where both goals are satisfied required sophisticated reward shaping and hyperparameter tuning.

Accomplishments that we're proud of

Real-time self-correction - Agent detects and fixes violations within milliseconds.

$50M+ annual savings quantified - Real impact calculations showing cost avoidance per tokamak.

What we learned

Physics Deep Dive

We gained deep understanding of tokamak physics, including:

  • How magnetic confinement works and why plasma is so unstable
  • The relationship between safety factor (q), beta (β), and disruption risk
  • Why Vertical Displacement Events are so catastrophic ($5M+ per event)
  • How plasma shape parameters interact in complex, nonlinear ways

What's next for Fusion Lab

  1. Uncertainty Quantification: Add Bayesian neural networks or ensemble methods to estimate confidence in safety predictions

  2. Open-Source Safety Framework: Release our safety guard architecture as a public good for the fusion research community

Built With

Share this project:

Updates