Performance Characteristics

QDD is a decision-diagram-based quantum circuit simulator. Its performance is best when the simulated state and operators have enough repeated structure for the decision diagram to share nodes. It can be slower when the circuit quickly creates many distinct amplitudes or many distinct edge weights, because the decision diagram then loses sharing and approaches the cost of an explicit state-vector representation with additional bookkeeping overhead.

This page gives practical guidance for choosing workloads where QDD is likely to work well.

When QDD Tends To Be Fast

QDD is well suited to circuits whose state evolution remains compact as a decision diagram:

Grover-style search circuits with repeated oracle and diffusion blocks.
Shor-style arithmetic circuits made from reversible modular arithmetic, controlled operations, and repeated structured subcircuits.
Circuits with many Clifford, permutation, controlled-NOT, Toffoli, and other operations that preserve regular structure.
Problems where a large Hilbert space is explored, but the amplitudes still fall into repeated patterns.

Grover and Shor circuits often have these properties. They may use many qubits and many gates, but the circuit structure is highly regular, and intermediate states often contain repeated subgraphs that a decision diagram can represent compactly.

When QDD Can Be Slow

QDD is less favorable for circuits that create many unrelated amplitudes:

VQE circuits with hardware-efficient ansatze such as EfficientSU2 or RealAmplitudes at large depth.
Circuits with many independent small-angle rotations.
Random or problem-instance-specific parameterized circuits with little repeated structure.
Optimization loops that evaluate thousands of slightly different parameter points.

In these cases, each parameter value can introduce distinct edge weights. When those weights are all different, the decision diagram cannot merge as many nodes. A state-vector simulator may then be faster because it performs dense linear algebra directly without maintaining the decision-diagram structure.

This does not mean VQE is unsupported. QDD can run VQE-style circuits, but it is usually not the workload where QDD has the strongest advantage.

Example: Grover With Qiskit’s Grover Class

The examples in this section use optional Qiskit ecosystem packages such as qiskit-algorithms and qiskit-nature in addition to QDD.

Qiskit Algorithms provides Grover and AmplificationProblem, so you do not need to write the Grover iterator by hand. The example below uses Qiskit’s PhaseOracleGate to turn a 3-SAT instance into a phase oracle, then runs the algorithm with QDD’s Sampler.

from qiskit.circuit.library.phase_oracle import PhaseOracleGate
from qiskit.synthesis.boolean.boolean_expression import BooleanExpression
from qiskit_algorithms import AmplificationProblem, Grover

from qdd.qdd_sampler import Sampler


input_3sat_instance = """
c example DIMACS-CNF 3-SAT
p cnf 3 5
-1 -2 -3 0
1 -2 3 0
1 2 -3 0
1 -2 -3 0
-1 2 3 0
"""

expression = BooleanExpression.from_dimacs(input_3sat_instance).expression
oracle = PhaseOracleGate(expression)
problem = AmplificationProblem(oracle)

grover = Grover(sampler=Sampler())
result = grover.amplify(problem)

print(result.assignment)

This is a favorable pattern for QDD because the algorithm repeatedly applies a structured oracle and diffusion operator. The search space grows with the number of variables, but the circuit can still contain repeated subgraphs that a decision diagram can share.

The same example can use MPI by constructing the sampler with backend options and launching the script with mpirun:

grover = Grover(sampler=Sampler(backend_options={"use_mpi": True}))

mpirun -n 4 python grover_3sat.py

This example follows the public Qiskit Algorithms Grover and 3-SAT examples.

Example: Shor From IBM Quantum’s Tutorial

Shor’s algorithm is another good match for a decision diagram simulator because its core is order finding through reversible modular arithmetic. The circuit is not necessarily small, but it is made from repeated arithmetic blocks, controlled modular multiplication, and an inverse QFT.

Unlike Grover and VQE, Qiskit documents Shor as a tutorial circuit rather than as a single main algorithm API class. For a generally available reference implementation, use IBM Quantum’s “Shor’s algorithm” tutorial. It constructs the order-finding circuit for N = 15 and a = 2 using Qiskit’s standard QFT and UnitaryGate building blocks, then runs the circuit with a Sampler and classically post-processes the measured phase.

The QDD-facing part is the same as for any Qiskit circuit:

from qiskit import transpile
from qiskit.circuit.library import QFT, UnitaryGate

from qdd import QddProvider

# Build the order-finding circuit as in IBM Quantum's Shor tutorial:
# https://qiskit.qotlabs.org/docs/tutorials/shors-algorithm
circuit = build_order_finding_circuit_for_N15_a2()

backend = QddProvider().get_backend("qasm_simulator")
compiled = transpile(circuit, backend, optimization_level=1)
counts = backend.run(compiled, shots=1024).result().get_counts()
print(counts)

The placeholder build_order_finding_circuit_for_N15_a2() is intentionally not a QDD-specific helper. It should be replaced by the circuit construction from the public IBM Quantum tutorial. That keeps the example independent of this repository’s benchmark code while documenting how to run the resulting circuit with QDD.

Example: H2 VQE With Qiskit Nature

Qiskit Nature provides chemistry-specific components for VQE. The example below uses its standard H2 setup: PySCFDriver builds the electronic structure problem, JordanWignerMapper maps it to qubits, and UCCSD with a HartreeFock initial state provides a chemically motivated ansatz.

import numpy as np
from qiskit_algorithms import VQE
from qiskit_algorithms.optimizers import SLSQP
from qiskit_nature.second_q.algorithms import GroundStateEigensolver
from qiskit_nature.second_q.circuit.library import HartreeFock, UCCSD
from qiskit_nature.second_q.drivers import PySCFDriver
from qiskit_nature.second_q.mappers import JordanWignerMapper

from qdd.qdd_estimator import Estimator


driver = PySCFDriver(atom="H 0 0 0; H 0 0 0.735", basis="sto-3g")
problem = driver.run()

mapper = JordanWignerMapper()

ansatz = UCCSD(
    problem.num_spatial_orbitals,
    problem.num_particles,
    mapper,
    initial_state=HartreeFock(
        problem.num_spatial_orbitals,
        problem.num_particles,
        mapper,
    ),
)

vqe = VQE(Estimator(default_precision=0.0), ansatz, SLSQP())
vqe.initial_point = np.zeros(ansatz.num_parameters)

solver = GroundStateEigensolver(mapper, vqe)
result = solver.solve(problem)

print(result.total_energies[0])

This is a realistic and convenient way to express a molecular VQE problem, but it is also representative of workloads where QDD may be slower than a dense state-vector simulator. UCC-style and hardware-efficient ansatze contain many parameterized rotations. During an optimizer loop, each evaluation uses a different parameter vector, which often produces many distinct decision-diagram edge weights and reduces node sharing.

For this class of workloads, use a state-vector simulator as a baseline and choose QDD only when measurements show that the specific ansatz and Hamiltonian remain compact.

Practical Selection Guide

Use QDD first when:

the circuit contains repeated structured blocks;
the algorithm is search, arithmetic, or reversible logic heavy;
you expect many amplitudes or suboperators to be identical or closely reusable;
memory use is the limiting factor for a state-vector simulator.

Benchmark against a state-vector simulator when:

the circuit is dominated by arbitrary rx, ry, rz, u, or controlled rotation gates with many distinct angles;
the circuit resembles a random parameterized ansatz;
the workload is an optimizer loop that repeatedly evaluates slightly different circuits;
the decision diagram grows quickly during early test runs.

For a new workload, start with a smaller instance, compare QDD with a state-vector backend, and scale only if the QDD run shows the expected compact representation advantage.

Benchmark Script

The repository includes bench/compare_simulators.py to compare QDD with Qiskit Aer statevector and matrix-product-state simulation on the example workloads above:

python bench/compare_simulators.py --benchmark grover --grover-qubits 20
python bench/compare_simulators.py --benchmark shor --counting-qubits 22
python bench/compare_simulators.py --benchmark vqe --molecule h4 --vqe-reps 8
python bench/compare_simulators.py --benchmark all

The VQE benchmark builds molecular Hamiltonians with Qiskit Nature. By default it uses a deeper EfficientSU2 ansatz to make the small-angle-rotation workload large enough to time; pass --vqe-ansatz uccsd to use the chemistry-specific UCCSD ansatz instead. Install the project with the test extra before running the benchmark:

pip install .[test]

The defaults are intentionally larger than smoke tests so the timing differences are visible. For a quick functional check, reduce the scale, for example --grover-qubits 12, --counting-qubits 8, or --vqe-steps 1. The VQE benchmark is shot based: each Pauli term is measured with --vqe-shots shots for each optimizer evaluation. The output also includes simple correctness checks: Grover reports the measured probability of the marked prefix, Shor reports the measured probability of recovering order 4 for N = 15, and VQE reports the sampled energy difference from Aer statevector.

Example Timing Results

The following measurements were taken on one development environment with .venv_qdd/bin/python bench/compare_simulators.py. They are intended as a representative comparison, not as portable absolute performance numbers.

Grover prefix search used 20 qubits, 256 shots, marked prefix 101010101010, and the automatically selected Grover iteration count. The correctness metric is the measured probability that the sampled bitstring starts with the marked prefix.

Simulator	Transpile [s]	Run [s]	Total [s]	Correctness
Aer statevector	0.080887	7.784454	7.865341	prefix success 1.000
Aer MPS	1.398227	12.111329	13.509557	prefix success 1.000
QDD	0.022978	0.109171	0.132149	prefix success 1.000

This result matches the expected QDD behavior: the Grover oracle and diffusion operator are highly structured, and the state remains compact enough for the decision diagram representation to share nodes effectively.

Shor order finding used N = 15, base a = 2, 22 counting qubits, and 256 shots. The correctness metric is the probability that classical post-processing recovers the expected order 4.

Simulator	Transpile [s]	Run [s]	Total [s]	Correctness
Aer statevector	0.087232	9.524342	9.611574	order-4 success 0.512
Aer MPS	0.064690	0.100180	0.164869	order-4 success 0.508
QDD	0.039742	0.197238	0.236979	order-4 success 0.469

This also broadly matches the expected picture for arithmetic circuits, but it shows an important nuance: MPS can be very competitive when the particular arithmetic circuit has low effective entanglement in the chosen qubit order. QDD remains much faster than Aer statevector in this run, while MPS is fastest for this small N = 15 order-finding instance.

VQE used the H4 Hamiltonian from Qiskit Nature, EfficientSU2(reps=8), one optimizer evaluation, and 128 shots per Pauli term. The correctness metric is reported as the sampled energy difference from Aer statevector, using the same initial parameters and shot count.

Simulator	Transpile [s]	Run [s]	Total [s]	Energy Delta vs Aer Statevector
Aer statevector	2.635268	1.146089	3.781358	+0.000000
Aer MPS	2.710340	5.776522	8.486862	+0.019080
QDD	2.356933	1.914973	4.271906	-0.023605

This VQE result needs careful interpretation. It is consistent with the guidance above when QDD is compared with dense statevector simulation: QDD is slower than Aer statevector on this small-angle-rotation workload. However, QDD is faster than Aer MPS in this specific run. That does not mean VQE is generally favorable for QDD. The EfficientSU2 ansatz uses full entanglement, which is often unfavorable for MPS because MPS performance depends strongly on low entanglement and a good one-dimensional qubit ordering. In contrast, this particular VQE instance is still small enough that QDD’s overhead does not dominate as much as it would for larger or more random parameterized circuits. For VQE-style workloads, treat Aer statevector as the primary baseline and measure the specific ansatz before choosing QDD.

References

Qiskit Algorithms: Grover’s algorithm examples, https://qiskit-community.github.io/qiskit-algorithms/tutorials/07_grover_examples.html
Qiskit Nature: getting started with H2 VQE, https://qiskit-community.github.io/qiskit-nature/getting_started.html
IBM Quantum Documentation: Shor’s algorithm, https://qiskit.qotlabs.org/docs/tutorials/shors-algorithm