ADR-QE-007: QAOA MaxCut Implementation

February 6, 2026 · View on GitHub

Status: Proposed Date: 2026-02-06 Authors: ruv.io, RuVector Team Deciders: Architecture Review Board

Version History

VersionDateAuthorChanges
0.12026-02-06ruv.ioInitial QAOA MaxCut architecture proposal

Context

Combinatorial Optimization on Quantum Computers

The Quantum Approximate Optimization Algorithm (QAOA), introduced by Farhi, Goldstone, and Gutmann (2014), is a leading candidate for demonstrating quantum advantage on combinatorial optimization problems. QAOA constructs a parameterized quantum circuit that encodes the cost function of an optimization problem and uses classical outer-loop optimization to find parameters that maximize the expected cost.

MaxCut as the Canonical QAOA Problem

MaxCut is the prototypical problem for QAOA: given a graph G = (V, E), partition the vertices into two sets S and S-complement to maximize the number of edges crossing the partition.

MaxCut Example (5 vertices, 6 edges):

    0 ─── 1
    │ \   │
    │   \ │
    3 ─── 2

          4

Optimal cut: S = {0, 2, 4}, S' = {1, 3}
Cut value: 5 edges crossing (0-1, 0-3, 1-2, 2-3, 2-4)

The cost function is:

C(z) = sum_{(i,j) in E} (1 - z_i * z_j) / 2

where z_i in {+1, -1} encodes the partition assignment.

QAOA Circuit Structure

A depth-p QAOA circuit alternates two types of layers:

  1. Phase separation (encodes the problem): For each edge (i,j), apply exp(-i * gamma * Z_i Z_j / 2)
  2. Mixing (explores the solution space): For each qubit i, apply exp(-i * beta * X_i) = Rx(2*beta)
QAOA Circuit (p layers):

|+>  ──[Phase(gamma_1)]──[Mix(beta_1)]──[Phase(gamma_2)]──[Mix(beta_2)]── ... ──[Measure]

Parameters: gamma = [gamma_1, ..., gamma_p], beta = [beta_1, ..., beta_p]            │

                                                                              Classical
                                                                              Optimizer

Why QAOA Matters for ruQu

MotivationDetails
Optimization benchmarksStandard workload for evaluating quantum simulator performance
Graph problemsNatural integration with ruVector graph database (ruvector-graph)
Variational algorithmShares infrastructure with VQE (ADR-QE-005): parameterized circuits, expectation values, classical optimizers
Scalability studyQAOA depth and graph size provide tunable complexity for benchmarking
Agent integrationruVector agents can use QAOA to solve graph optimization tasks autonomously

Decision

1. Phase Separation Operator: Native Rzz Gate

The phase separation operator for MaxCut applies exp(-i * gamma * Z_i Z_j / 2) for each edge (i,j). We implement this as a native two-qubit operation via direct amplitude manipulation, avoiding CNOT decomposition.

Mathematical basis:

exp(-i * theta * Z_i Z_j / 2) acts on computational basis states as:

  |00> -> e^{-i*theta/2} |00>    (Z_i Z_j = +1)
  |01> -> e^{+i*theta/2} |01>    (Z_i Z_j = -1)
  |10> -> e^{+i*theta/2} |10>    (Z_i Z_j = -1)
  |11> -> e^{-i*theta/2} |11>    (Z_i Z_j = +1)

In the state vector, for each amplitude at index k:

  • Extract bits i and j from k
  • Compute parity = bit_i XOR bit_j
  • Apply phase: amp[k] *= exp(-i * theta * (-1)^parity / 2)
    • If parity = 0 (same bits): amp[k] *= exp(-i * theta / 2)
    • If parity = 1 (different bits): amp[k] *= exp(+i * theta / 2)
impl QuantumState {
    /// Apply Rzz(theta) = exp(-i * theta * Z_i Z_j / 2) via direct amplitude
    /// manipulation.
    ///
    /// For each basis state |k>:
    ///   - Compute parity of bits i and j in k
    ///   - Apply phase e^{-i * theta * (-1)^parity / 2}
    ///
    /// Complexity: O(2^n) -- single pass over state vector.
    /// Vectorizable: all amplitudes are independent (no swaps).
    ///
    /// Hardware equivalent: CNOT(i,j) + Rz(theta, j) + CNOT(i,j) = 3 gates.
    pub fn rzz(&mut self, theta: f64, qubit_i: usize, qubit_j: usize) {
        let phase_same = Complex64::from_polar(1.0, -theta / 2.0);
        let phase_diff = Complex64::from_polar(1.0, theta / 2.0);

        let mask_i = 1_usize << qubit_i;
        let mask_j = 1_usize << qubit_j;

        for k in 0..self.amplitudes.len() {
            let bit_i = (k & mask_i) >> qubit_i;
            let bit_j = (k & mask_j) >> qubit_j;
            let parity = bit_i ^ bit_j;

            if parity == 0 {
                self.amplitudes[k] *= phase_same;
            } else {
                self.amplitudes[k] *= phase_diff;
            }
        }
    }
}

Vectorization opportunity: The inner loop is a streaming operation over the amplitude array with no data dependencies between iterations. This is ideal for SIMD vectorization (AVX-512 can process 8 complex64 values per instruction) and parallelization across cores.

2. Mixing Operator

The mixing operator applies Rx(2*beta) to each qubit:

Rx(2*beta) = exp(-i * beta * X) = [[cos(beta), -i*sin(beta)],
                                     [-i*sin(beta), cos(beta)]]

This uses the standard single-qubit gate application from the simulator core:

impl QuantumState {
    /// Apply the QAOA mixing operator: Rx(2*beta) on each qubit.
    ///
    /// Complexity: O(n * 2^n) for n qubits.
    pub fn qaoa_mixing(&mut self, beta: f64) {
        for qubit in 0..self.n_qubits {
            self.rx(2.0 * beta, qubit);
        }
    }
}

3. QAOA Circuit Construction

A convenience function builds the full QAOA circuit from a graph and parameters:

/// A graph represented as an edge list with optional weights.
pub struct Graph {
    /// Number of vertices
    pub n_vertices: usize,
    /// Edges: (vertex_i, vertex_j, weight)
    pub edges: Vec<(usize, usize, f64)>,
}

impl Graph {
    /// Construct from adjacency list.
    pub fn from_adjacency_list(adj: &[Vec<usize>]) -> Self;

    /// Construct from edge list (unweighted, weight = 1.0).
    pub fn from_edge_list(n_vertices: usize, edges: &[(usize, usize)]) -> Self;

    /// Load from ruVector graph query result.
    pub fn from_ruvector_query(result: &GraphQueryResult) -> Self;
}

/// QAOA configuration.
pub struct QaoaConfig {
    /// Graph defining the MaxCut instance
    pub graph: Graph,
    /// QAOA depth (number of layers)
    pub p: usize,
    /// Gamma parameters (phase separation angles), length = p
    pub gammas: Vec<f64>,
    /// Beta parameters (mixing angles), length = p
    pub betas: Vec<f64>,
}

/// Build and simulate a QAOA circuit for MaxCut.
///
/// Circuit structure for depth p:
///   1. Initialize |+>^n (Hadamard on all qubits)
///   2. For layer l = 1..p:
///      a. Phase separation: Rzz(gamma_l, i, j) for each edge (i,j)
///      b. Mixing: Rx(2*beta_l) on each qubit
///   3. Return final state
pub fn build_qaoa_circuit(config: &QaoaConfig) -> QuantumState {
    let n = config.graph.n_vertices;
    let mut state = QuantumState::new(n);

    // Step 1: Initialize uniform superposition
    state.hadamard_all();

    // Step 2: Alternating phase separation and mixing layers
    for layer in 0..config.p {
        let gamma = config.gammas[layer];
        let beta = config.betas[layer];

        // Phase separation: apply Rzz for each edge
        for &(i, j, weight) in &config.graph.edges {
            state.rzz(gamma * weight, i, j);
        }

        // Mixing: Rx(2*beta) on each qubit
        state.qaoa_mixing(beta);
    }

    state
}

Pseudocode for the complete QAOA MaxCut solver:

pub fn qaoa_maxcut(
    graph: &Graph,
    p: usize,
    optimizer: &mut dyn ClassicalOptimizer,
    config: &QaoaOptConfig,
) -> QaoaResult {
    let n_params = 2 * p; // p gammas + p betas
    optimizer.initialize(n_params);

    let mut params = config.initial_params.clone()
        .unwrap_or_else(|| {
            // Standard initialization: gamma in [0, pi], beta in [0, pi/2]
            let mut p_init = vec![0.0; n_params];
            for i in 0..p {
                p_init[i] = 0.5;          // gamma_i
                p_init[p + i] = 0.25;     // beta_i
            }
            p_init
        });

    let mut best_cost = f64::NEG_INFINITY;
    let mut best_params = params.clone();
    let mut history = Vec::new();

    for iteration in 0..config.max_iterations {
        let gammas = params[..p].to_vec();
        let betas = params[p..].to_vec();

        // Build and simulate circuit
        let qaoa_config = QaoaConfig {
            graph: graph.clone(),
            p,
            gammas,
            betas,
        };
        let state = build_qaoa_circuit(&qaoa_config);

        // Evaluate MaxCut cost function
        let cost = maxcut_expectation(&state, graph);

        if cost > best_cost {
            best_cost = cost;
            best_params = params.clone();
        }

        // Gradient computation (parameter-shift rule, same as VQE)
        let grad = if optimizer.needs_gradient() {
            Some(qaoa_gradient(graph, p, &params))
        } else {
            None
        };

        history.push(QaoaIteration { iteration, cost, params: params.clone() });

        let result = optimizer.step(&params, -cost, grad.as_deref());
        // Note: negate cost because optimizer minimizes
        params = result.new_params;

        if result.converged {
            break;
        }
    }

    // Sample the final state to get candidate cuts
    let final_state = build_qaoa_circuit(&QaoaConfig {
        graph: graph.clone(),
        p,
        gammas: best_params[..p].to_vec(),
        betas: best_params[p..].to_vec(),
    });
    let best_cut = sample_maxcut(&final_state, graph, config.sample_shots);

    QaoaResult {
        best_cost,
        best_params,
        best_cut,
        iterations: history.len(),
        history,
        approximation_ratio: best_cost / graph.max_cut_upper_bound(),
    }
}

4. Cost Function Evaluation

The MaxCut cost function in Pauli operator form is:

C = sum_{(i,j) in E} w_{ij} * (1 - Z_i Z_j) / 2

This reuses the PauliSum expectation API from ADR-QE-005:

/// Compute the MaxCut cost as the expectation value of the cost Hamiltonian.
///
/// C = sum_{(i,j) in E} w_ij * (1 - Z_i Z_j) / 2
///   = sum_{(i,j) in E} w_ij/2 - sum_{(i,j) in E} w_ij/2 * Z_i Z_j
///   = const - sum_{(i,j)} w_ij/2 * <Z_i Z_j>
///
/// Each Z_i Z_j expectation is computed via the efficient diagonal trick:
/// <psi| Z_i Z_j |psi> = sum_k |amp_k|^2 * (-1)^{bit_i(k) XOR bit_j(k)}
pub fn maxcut_expectation(state: &QuantumState, graph: &Graph) -> f64 {
    let mut cost = 0.0;

    for &(i, j, weight) in &graph.edges {
        let mask_i = 1_usize << i;
        let mask_j = 1_usize << j;

        let mut zz_expectation = 0.0;
        for k in 0..state.amplitudes.len() {
            let bit_i = (k & mask_i) >> i;
            let bit_j = (k & mask_j) >> j;
            let parity = bit_i ^ bit_j;
            let sign = 1.0 - 2.0 * parity as f64; // +1 if same, -1 if different
            zz_expectation += state.amplitudes[k].norm_sqr() * sign;
        }

        cost += weight * (1.0 - zz_expectation) / 2.0;
    }

    cost
}

Optimization: Since Z_i Z_j is diagonal in the computational basis, the expectation reduces to a weighted sum over probabilities. No amplitude swapping is needed, and the computation is embarrassingly parallel.

5. Sampling Mode

In addition to exact expectation values, we support sampling the final state to obtain candidate cuts:

/// Sample the QAOA state to find candidate MaxCut solutions.
///
/// Returns the best cut found across `shots` samples.
pub fn sample_maxcut(
    state: &QuantumState,
    graph: &Graph,
    shots: usize,
) -> MaxCutSolution {
    let probabilities: Vec<f64> = state.amplitudes.iter()
        .map(|a| a.norm_sqr())
        .collect();

    let mut best_cut_value = 0.0;
    let mut best_bitstring = 0_usize;
    let mut rng = thread_rng();

    for _ in 0..shots {
        // Sample from probability distribution
        let sample = sample_from_distribution(&probabilities, &mut rng);

        // Evaluate cut value for this bitstring
        let cut_value = evaluate_cut(sample, graph);

        if cut_value > best_cut_value {
            best_cut_value = cut_value;
            best_bitstring = sample;
        }
    }

    MaxCutSolution {
        partition: best_bitstring,
        cut_value: best_cut_value,
        set_s: (0..graph.n_vertices)
            .filter(|&v| (best_bitstring >> v) & 1 == 1)
            .collect(),
        set_s_complement: (0..graph.n_vertices)
            .filter(|&v| (best_bitstring >> v) & 1 == 0)
            .collect(),
    }
}

6. Graph Interface

Three input modes cover common use cases:

impl Graph {
    /// From adjacency list (unweighted).
    ///
    /// Example: adj[0] = [1, 3] means vertex 0 connects to 1 and 3.
    pub fn from_adjacency_list(adj: &[Vec<usize>]) -> Self {
        let n = adj.len();
        let mut edges = Vec::new();
        let mut seen = std::collections::HashSet::new();

        for (u, neighbors) in adj.iter().enumerate() {
            for &v in neighbors {
                let edge = if u < v { (u, v) } else { (v, u) };
                if seen.insert(edge) {
                    edges.push((edge.0, edge.1, 1.0));
                }
            }
        }

        Self { n_vertices: n, edges }
    }

    /// From edge list with uniform weight.
    pub fn from_edge_list(n_vertices: usize, edge_list: &[(usize, usize)]) -> Self {
        Self {
            n_vertices,
            edges: edge_list.iter().map(|&(u, v)| (u, v, 1.0)).collect(),
        }
    }

    /// From ruVector graph database query result.
    ///
    /// Enables QAOA MaxCut on graphs stored in ruvector-graph.
    pub fn from_ruvector_query(result: &GraphQueryResult) -> Self {
        // Convert ruvector-graph nodes and edges to QAOA format
        // Vertex IDs are remapped to contiguous 0..n range
        todo!()
    }
}

7. Tensor Network Optimization for Sparse Graphs

For sparse or planar graphs, the QAOA state can be represented more efficiently using tensor network contraction. The key insight is that QAOA circuits have a structure dictated by the graph topology:

Tensor Network View of QAOA:

  Qubit 0: ──[H]──[Rzz(0,1)]──[Rzz(0,3)]──[Rx]── ...
  Qubit 1: ──[H]──[Rzz(0,1)]──[Rzz(1,2)]──[Rx]── ...
  Qubit 2: ──[H]──[Rzz(1,2)]──[Rzz(2,3)]──[Rx]── ...
  Qubit 3: ──[H]──[Rzz(0,3)]──[Rzz(2,3)]──[Rx]── ...

For a planar graph with treewidth w, tensor contraction costs O(2^w * poly(n))
instead of O(2^n). For many practical graphs, w << n.
/// Detect graph treewidth and decide simulation strategy.
pub fn select_simulation_strategy(graph: &Graph) -> SimulationStrategy {
    let treewidth = estimate_treewidth(graph);
    let n = graph.n_vertices;

    if treewidth <= 20 && n > 24 {
        // Tensor network contraction is cheaper than full state vector
        SimulationStrategy::TensorNetwork {
            contraction_order: compute_contraction_order(graph),
            estimated_cost: (1 << treewidth) * n * n,
        }
    } else {
        SimulationStrategy::StateVector {
            estimated_cost: 1 << n,
        }
    }
}

pub enum SimulationStrategy {
    StateVector { estimated_cost: usize },
    TensorNetwork {
        contraction_order: Vec<ContractionStep>,
        estimated_cost: usize,
    },
}

8. Performance Analysis

Gate Counts and Timing

For a graph with n vertices, m edges, and QAOA depth p:

OperationGate Count per LayerTotal Gates (p layers)
Phase separation (Rzz)mp * m
Mixing (Rx)np * n
Total per layerm + np * (m + n)

Benchmark estimates:

ConfigurationnmpTotal GatesEstimated Time
Small triangle3316<0.01ms
Petersen graph1015375<0.1ms
Random d-reg (d=3)10155125<0.5ms
Grid 4x520313189~50ms
Grid 4x520315315~100ms
Random d-reg (d=4)20405400~200ms
Dense (complete)201903630~300ms
Sparse large24363216~5s
Dense large2427651500~30s

Memory requirements:

QubitsState Vector SizeMemory
101,02416 KB
1665,5361 MB
201,048,57616 MB
2416,777,216256 MB
28268,435,4564 GB

9. Integration with ruvector-graph

The connection to ruVector's graph database enables a powerful workflow:

┌─────────────────────────────────────────────────────────────────────┐
│                  QAOA MaxCut Pipeline                                 │
│                                                                       │
│  ┌──────────────┐     ┌────────────────┐     ┌──────────────────┐   │
│  │ ruvector-graph│     │  QAOA Engine   │     │  Result Store    │   │
│  │              │     │                │     │                  │   │
│  │  Query:      │────>│  Build circuit │────>│  Optimal cut     │   │
│  │  "find all   │     │  Optimize      │     │  Partition       │   │
│  │   connected  │     │  Sample        │     │  Approximation   │   │
│  │   subgraphs  │     │                │     │  ratio           │   │
│  │   of size k" │     │                │     │                  │   │
│  └──────────────┘     └────────────────┘     └──────────────────┘   │
│                                                                       │
│  Data Flow:                                                           │
│  1. Agent queries ruvector-graph for subgraph                        │
│  2. Graph converted to QAOA format via Graph::from_ruvector_query()  │
│  3. QAOA optimizer runs with configurable depth p                     │
│  4. Results stored in ruVector memory for pattern learning            │
│  5. Agent uses learned patterns to choose p and initial parameters    │
└─────────────────────────────────────────────────────────────────────┘

The ruvector-mincut integration is particularly relevant: the existing SubpolynomialMinCut algorithm (El-Hayek/Henzinger/Li, O(n^{o(1)}) amortized) provides exact min-cut values that serve as a lower bound for MaxCut verification. QAOA solutions can be validated against this classical baseline.


Consequences

Benefits

  1. Native Rzz gate via direct amplitude manipulation avoids CNOT decomposition, yielding a simpler and faster phase separation implementation
  2. PauliSum expectation API reuse from ADR-QE-005 provides a unified interface for all variational algorithms (VQE, QAOA, and future extensions)
  3. Graph interface flexibility supports adjacency lists, edge lists, and ruVector graph queries, covering the most common input formats
  4. Tensor network fallback for low-treewidth graphs extends QAOA to larger problem instances than pure state vector simulation allows
  5. ruvector-graph integration enables a seamless pipeline from graph storage to quantum optimization to result analysis

Risks

RiskProbabilityImpactMitigation
QAOA at low depth p gives poor approximation ratiosHighMediumSupport high-p QAOA, classical warm-starting
Treewidth estimation is NP-hard in generalMediumLowUse heuristic upper bounds (min-degree, greedy)
Parameter landscape has many local minimaMediumMediumMulti-start optimization, INTERP initialization
Large dense graphs exhaust memoryMediumHighTensor network fallback, graph coarsening

Trade-offs

DecisionAdvantageDisadvantage
Direct Rzz over CNOT decompositionSimpler, fasterNot a one-to-one hardware circuit mapping
Exact expectation over samplingNo statistical noiseDoes not model real hardware shot noise
Automatic strategy selectionTransparent to userAdditional complexity in simulation backend
Integrated graph interfaceSeamless workflowCoupling to ruvector-graph API

References

  • Farhi, E., Goldstone, J., Gutmann, S. "A Quantum Approximate Optimization Algorithm." arXiv:1411.4028 (2014)
  • Hadfield, S. et al. "From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz." Algorithms 12, 34 (2019)
  • Zhou, L. et al. "Quantum Approximate Optimization Algorithm: Performance, Mechanism, and Implementation on Near-Term Devices." Physical Review X 10, 021067 (2020)
  • Guerreschi, G.G., Matsuura, A.Y. "QAOA for Max-Cut requires hundreds of qubits for quantum speed-up." Scientific Reports 9, 6903 (2019)
  • ADR-001: ruQu Architecture - Classical Nervous System for Quantum Machines
  • ADR-QE-005: VQE Algorithm Support (shared parameterized circuit and optimizer infrastructure)
  • ADR-QE-006: Grover's Search Implementation (quantum state manipulation primitives)
  • ruvector-mincut: crates/ruvector-mincut/ - El-Hayek/Henzinger/Li subpolynomial min-cut
  • ruvector-graph: graph database integration for sourcing MaxCut instances