Online Optimization of Dissipative Cat Qubits

YQuantum 2026 — Alice & Bob Challenge

Overview

Quantum error correction is essential for fault-tolerant quantum computation, but standard approaches demand large qubit overhead. Cat qubits offer a shortcut: encode quantum information in superpositions of coherent states of a harmonic oscillator, resulting in bit-flip errors becoming exponentially suppressed with increasing cat size $|\alpha|^2$, while phase-flip errors grow only polynomially. Recent experiments have pushed bit-flip times past 100 seconds and demonstrated real-time error correction beyond break-even. That said, optimal performance requires continuous tuning of control parameters that drift with the hardware in real time.

So, we built a comprehensive framework to tackle the question of what combination of reward signal and black box optimization algorithm most effectively tracks the optimal operating point of a dissipative cat qubit under realistic hardware conditions?

The Physical Model

The dissipative cat qubit lives in two coupled bosonic modes: a storage cavity hosting the logical qubit, and a buffer cavity that mediates engineered two-photon dissipation. The system Hamiltonian, then, is:

$$\frac{H}{\hbar} = g_2^* \, a^2 b^\dagger + g_2 \, (a^\dagger)^2 b - \varepsilon_d \, b^\dagger - \varepsilon_d^* \, b$$

where $g_2 \in \mathbb{C}$ is the two-photon exchange coupling and $\varepsilon_d \in \mathbb{C}$ drives the buffer. The parametric interaction converts pairs of storage photons into buffer photons and back, while single-photon loss on both modes — with buffer loss rate $\kappa_b \gg \kappa_a$ — is captured by Lindblad dissipators. When the buffer relaxes fast enough, it can be adiabatically eliminated, leaving an effective two-photon dissipation that stabilizes the manifold $\text{span}{|\alpha\rangle, |-\alpha\rangle}$.

The logical qubit is encoded in even and odd cat states — symmetric and antisymmetric superpositions of these coherent states. An important figure in this project is the bias, $\eta = T_Z / T_X$. Bit-flip time $T_Z$ grows exponentially as $e^{2|\alpha|^2}$ because flipping between $|\alpha\rangle$ and $|-\alpha\rangle$ requires a macroscopic tunneling event through $|\alpha|^2$ photon losses. Phase-flip time $T_X$ scales polynomially as $1/(\kappa_a |\alpha|^2)$ since a single photon loss suffices to flip parity. Our goal is to maximize both $T_Z$ and $T_X$ while holding $\eta$ at some perscribed threshold.

The Measurement Challenge: You Don't Know $\alpha$

One of the trickiest aspects of this problem is that you don't actually know $\alpha$. There are heuristics, but they aren't experimentally accessible. So, you can't just prepare a nice $|+z\rangle = |\alpha\rangle$ eigenstate and watch it decay, because the logical $Z$ operator itself is defined in terms of $\alpha$:

$$Z_L = |\alpha\rangle\langle\alpha| - |-\alpha\rangle\langle-\alpha|$$

This poses the question, how do you measure $T_Z$ and $T_X$ without knowing the very parameter that defines your logical basis? Our approach follows a vacuum-start protocol. For $T_X$, we exploit the fact that the parity operator $P = e^{i\pi a^\dagger a}$ is completely $\alpha$-independent — it just counts whether you have an even or odd number of photons. We start from the Fock vacuum $|0\rangle \otimes |0\rangle_b$, let the two-photon dissipation evolve it into the even cat state (which has definite parity), then measure the parity decay:

$$T_X = -\frac{1}{m}, \qquad m = \text{slope of } \ln|\langle P \rangle(t)| \text{ vs } t \text{ (post-settle)}$$

For $T_Z$, we extract $\alpha$ from the data itself after the cat has formed: $|\alpha| = \sqrt{\langle a^\dagger a \rangle(t_s)}$ and $\theta = \arg(g_2 \varepsilon_d)/2$. Then we prepare $|\alpha_\text{est}\rangle$ and measure the quadrature decay along the cat axis:

$$Q_\theta = a \, e^{-i\theta} + a^\dagger \, e^{i\theta}$$

$$T_Z = -\frac{1}{m}, \qquad m = \text{slope of } \ln|\langle Q_\theta \rangle(t)| \text{ vs } t$$

This two-phase protocol wherein we settle from vacuum, then measure, avoids any heuristic assumptions about the cat size, at the cost of requiring two separate master equation solves and breaking JAX's JIT compilation. So, we also look at heuristics.

Reward Functions

We considered a set of potential reward/cost functions, spanning a range of physical fidelities.

At the fast end, JIT-compiled proxy rewards estimate lifetimes from single-point exponential decay measurements. A multi-point variant runs ordinary least-squares regression on log-transformed decay data for better noise robustness, at the cost of higher time complexity. We also considered a photon number reward (just target a mean photon count, so the cheapest possible signal) and a fidelity reward measuring overlap with the target cat state.

In the middle sit the parity and vacuum rewards described above, which use the $\alpha$-free measurement protocol. An enhanced proxy augments lifetime estimates with physics-motivated penalties for buffer photon leakage, code-space confinement loss, and proximity to the cat formation threshold.

At the most expensive end, we could construct the full Lindbladian superoperator and extracts $T_Z$ and $T_X$ directly from its slowest-decaying eigenvalues. This involves no curve fitting, and no probe time sensitivity, but it scales largely in Hilbert space dimension.

Optimization Algorithms

We considered some black-box optimizers, all sharing a common ask–tell interface for fair comparison across the four-dimensional control space $(\text{Re}(g_2), \text{Im}(g_2), \text{Re}(\varepsilon_d), \text{Im}(\varepsilon_d))$.

The most standard method, CMA-ES maintains a multivariate Gaussian search distribution and adapts its covariance based on fitness-ranked samples. The CMA-ES/Adam hybrid alternates broad CMA-ES exploration with local gradient-based refinement through JAX autodiff. REINFORCE uses the reinforcement learning approach, and PPO-Clip extends this with multi-epoch updates and a clipped surrogate objective that prevents large policy steps. We found CMA-ES to be the highest performing optimization algorithm.

Drift: Hardware Doesn't Stay Still

We implemented several drift models: sinusoidal amplitude fluctuations on the coupling strength, smooth and square-wave frequency detuning, time-varying Kerr nonlinearity, progressive measurement SNR degradation.

We generally found that CMA-ES retained considerable performance even under various forms of drift.

Extensions

We extended the framework beyond the standard four control parameters with two physically motivated additions. The moon cat extension adds a squeezing parameter $\lambda$ that deforms circular phase-space blobs into crescent-shaped "moon" states, theoretically offering significant improvement in $T_X$ at the same mean photon number, with the scaling exponent jumping from $\gamma \approx 1$ to $\gamma \approx 4.3$. The Zeno gate extension adds a single-photon drive $\varepsilon_Z$ that implements logical rotations while the two-photon dissipation continues stabilizing the manifold.

Implementation

All Lindblad master equation simulations run on dynamiqs, a JAX-based library for GPU-accelerated quantum dynamics. All code is included in the github.