Adaptive Online Calibration of Dissipative Cat Qubits Under Hardware Drift

Inspiration

As quantum hardware scales toward practical fault tolerance, calibration drift has emerged as one of the most persistent engineering challenges. Model-based calibration strategies struggle with the non-stationary, interdependent nature of real device parameters — a problem that only worsens as systems grow more complex.

Dissipative cat qubits caught our attention because they offer a fundamentally different error structure compared to transmon-based architectures. By encoding logical information in superpositions of coherent states $|\alpha\rangle$ and $|-\alpha\rangle$ within a harmonic oscillator, cat qubits exhibit an exponentially biased noise channel: bit-flip errors are suppressed exponentially with cat size $|\alpha|^2$, while phase-flip errors grow only linearly. This bias, characterized by the ratio $\eta = T_Z / T_X$, is precisely what makes cat qubits attractive for hardware-efficient quantum error correction.

However, maintaining this bias under realistic hardware conditions — where drive amplitudes shift, resonator frequencies wander, and loss rates fluctuate — requires continuous, adaptive recalibration. This challenge inspired us to build a model-free, measurement-based optimizer that can track and compensate for drift in real time.

What We Learned

The Physics

A cat qubit is stabilized through engineered two-photon dissipation. In the physical implementation, a storage mode $a$ is coupled to a lossy buffer mode $b$ via a two-to-one photon exchange interaction:

$$\frac{H}{\hbar} = g_2^* \hat{a}^2 \hat{b}^\dagger + g_2 (\hat{a}^\dagger)^2 \hat{b} - \epsilon_d \hat{b}^\dagger - \epsilon_d^* \hat{b}$$

with dissipation channels $L_b = \sqrt{\kappa_b}\,\hat{b}$ (fast buffer decay) and $L_a = \sqrt{\kappa_a}\,\hat{a}$ (unwanted single-photon storage loss). When the buffer decay rate satisfies $\kappa_b \gg |g_2|$, adiabatic elimination yields an effective two-photon dissipation on the storage mode with:

$$\epsilon_2 = \frac{2 g_2 \epsilon_d}{\kappa_b}, \quad \kappa_2 = \frac{4|g_2|^2}{\kappa_b}, \quad \alpha \approx \sqrt{\frac{2}{\kappa_2}\left(\epsilon_2 - \frac{\kappa_a}{4}\right)}$$

The key insight is that $g_2$ (complex) and $\epsilon_d$ (complex) — four real knobs total — control the cat size and consequently both logical lifetimes $T_Z$ and $T_X$.

The Optimization Challenge

The optimizer must pursue two competing objectives simultaneously:

  1. Maximize absolute lifetimes $T_Z$ and $T_X$
  2. Achieve a target noise bias $\eta = T_Z / T_X \approx 200$

This is non-trivial because naively maximizing $\eta$ could be achieved by making $T_X$ tiny (a terrible qubit with great bias). We needed a reward function that captures both goals without one dominating the other.

How We Built It

Reward Function Design

This was our core contribution. We designed a log-scale reward function:

$$R = \underbrace{\log(T_Z) + \log(T_X)}{\text{lifetime score}} - \lambda \underbrace{\left(\log(\eta) - \log(\eta{\text{target}})\right)^2}_{\text{bias penalty}}$$

The design choices were deliberate:

  • Log scale for lifetimes: $T_Z \sim 50\,\mu s$ and $T_X \sim 0.2\,\mu s$ differ by two orders of magnitude. Taking logarithms compresses them onto comparable scales ($\log(50) \approx 3.9$ vs $\log(0.2) \approx -1.6$), so neither dominates.
  • Log-ratio for bias penalty: Using $(\log \eta - \log \eta_{\text{target}})^2$ instead of $(\eta - \eta_{\text{target}})^2$ avoids catastrophic scaling. If $\eta = 300$ and target is $200$, the linear penalty is $(100)^2 = 10{,}000$ while the log penalty is $(\log 1.5)^2 \approx 0.16$ — much more manageable and proportional.
  • Tunable trade-off: The parameter $\lambda$ controls the balance between lifetime maximization and bias targeting.

Lifetime Extraction

Each reward evaluation requires measuring both $T_Z$ and $T_X$:

  • $T_Z$ (bit-flip lifetime): Initialize in the logical $|{+z}\rangle = |\alpha\rangle$ state, evolve under the full master equation for $200\,\mu s$, track $\langle\sigma_z^L\rangle$ decay, and extract $T_Z$ via robust exponential fitting ($y = Ae^{-t/\tau} + C$ with soft-$\ell_1$ loss).
  • $T_X$ (phase-flip lifetime): Initialize in $|{+x}\rangle = (|\alpha\rangle + |{-\alpha}\rangle)/\sqrt{2}$, evolve for $1\,\mu s$, track $\langle\sigma_x^L\rangle$ (photon parity operator), and fit similarly.

The asymmetric simulation times ($200\,\mu s$ vs $1\,\mu s$) reflect the vastly different timescales of the two decay processes.

Online CMA-ES Optimizer

We used a separable CMA-ES (Covariance Matrix Adaptation Evolution Strategy) in an ask/tell loop running every epoch, directly following the structure of the challenge notebook's $\pi$-pulse drift example:

for each epoch:
    1. Sample 8 candidate parameter vectors from the optimizer's distribution
    2. Evaluate each candidate's reward (with drift secretly applied)
    3. Report (parameters, reward) pairs back to the optimizer
    4. Optimizer updates its mean and covariance toward higher-reward regions

The optimizer maintains a Gaussian distribution over the 4-dimensional parameter space $(\text{Re}(g_2), \text{Im}(g_2), \text{Re}(\epsilon_d), \text{Im}(\epsilon_d))$ and continuously shifts this distribution to track the moving optimum. Crucially, the optimizer never sees the drift directly — it only observes the reward values and must infer that conditions have changed.

Drift Modeling

We implemented two types of drift to test optimizer responsiveness:

Sinusoidal drift — slow, continuous parameter wandering that mimics thermal fluctuations and gradual hardware aging:

$$\Delta_{\text{Re}(g_2)}(t) = 0.15 \sin(2\pi \cdot 0.008 \cdot t), \quad \Delta_{\text{Re}(\epsilon_d)}(t) = 0.3 \sin(2\pi \cdot 0.01 \cdot t + 1.0)$$

Step-function drift — a sudden parameter jump at epoch 50 that simulates abrupt environmental changes (e.g., a temperature spike or a cosmic ray hit):

$$\Delta_{\text{step}}(t) = \begin{cases} 0 & t < 50 \ (0.2,\; 0.05,\; 0.5,\; 0.1) & t \geq 50 \end{cases}$$

The step function is particularly informative because it reveals the optimizer's recovery time — how many epochs it takes to re-converge after a sudden perturbation.

Visualization

We used Wigner function plots to provide visual confirmation that the optimizer is maintaining a well-formed cat state. The Wigner function $W(x, p)$ is a quasi-probability distribution in phase space: a healthy cat qubit appears as two well-separated Gaussian blobs at $\pm\alpha$ with quantum interference fringes between them. Under drift with a fixed policy, the blobs degrade and shift; the adaptive optimizer keeps them clean and properly positioned.

Challenges We Faced

Computational cost. Each reward evaluation requires two full master equation simulations on a $75$-dimensional Hilbert space ($15 \times 5$ tensor product). With 8 candidates per epoch and 100 epochs, the static optimization alone requires 1,600 simulations. We mitigated this by using JAX-accelerated dynamiqs solvers and keeping the Hilbert space truncation tight.

Reward function balancing. Our first attempt used a linear-scale loss: $(\eta - \eta_{\text{target}})^2 - (T_Z + T_X \cdot \eta_{\text{target}})$. This failed because the bias penalty term ($\sim 10^4$) overwhelmed the lifetime term ($\sim 60$), causing the optimizer to chase bias matching at the expense of absolute lifetime. Switching to log scale resolved this completely.

Target bias selection. We initially set $\eta_{\text{target}} = 0.01$ (a misinterpretation), which would have asked the optimizer to make bit-flips 100× more frequent than phase-flips — the exact opposite of what makes cat qubits useful. Correcting this to $\eta_{\text{target}} = 200$ aligned the optimization with the physical motivation.

Logical operator construction. Measuring $\langle\sigma_x^L\rangle$ requires knowing the cat size $\alpha$, which itself depends on the parameters being optimized. If the estimated $\alpha$ is wrong, the logical $\sigma_z^L$ projector is constructed incorrectly and $T_Z$ measurements become unreliable. We addressed this by recomputing $\alpha$ from the adiabatic elimination formula at each evaluation, ensuring consistency between the physical simulation and the measurement operators.

Results

Our adaptive CMA-ES optimizer successfully:

  • Maintained high lifetimes $T_Z$ and $T_X$ while keeping the bias near the target $\eta = 200$
  • Tracked sinusoidal drift with minimal performance degradation compared to the no-drift baseline
  • Recovered from step-function perturbations within approximately 10–15 epochs, demonstrating practical robustness for real-time hardware recalibration
  • Outperformed the fixed-parameter policy under drift, as visualized through both quantitative reward metrics and qualitative Wigner function comparisons

Built With

  • dynamqis
  • jupyter
Share this project:

Updates