Performance Characteristics
QDD is a decision-diagram-based quantum circuit simulator. Its performance is best when the simulated state and operators have enough repeated structure for the decision diagram to share nodes. It can be slower when the circuit quickly creates many distinct amplitudes or many distinct edge weights, because the decision diagram then loses sharing and approaches the cost of an explicit state-vector representation with additional bookkeeping overhead.
This page gives practical guidance for choosing workloads where QDD is likely to work well.
When QDD Tends To Be Fast
QDD is well suited to circuits whose state evolution remains compact as a decision diagram:
Grover-style search circuits with repeated oracle and diffusion blocks.
Shor-style arithmetic circuits made from reversible modular arithmetic, controlled operations, and repeated structured subcircuits.
Circuits with many Clifford, permutation, controlled-NOT, Toffoli, and other operations that preserve regular structure.
Problems where a large Hilbert space is explored, but the amplitudes still fall into repeated patterns.
Grover and Shor circuits often have these properties. They may use many qubits and many gates, but the circuit structure is highly regular, and intermediate states often contain repeated subgraphs that a decision diagram can represent compactly.
When QDD Can Be Slow
QDD is less favorable for circuits that create many unrelated amplitudes:
VQE circuits with hardware-efficient ansatze such as
EfficientSU2orRealAmplitudesat large depth.Circuits with many independent small-angle rotations.
Random or problem-instance-specific parameterized circuits with little repeated structure.
Optimization loops that evaluate thousands of slightly different parameter points.
In these cases, each parameter value can introduce distinct edge weights. When those weights are all different, the decision diagram cannot merge as many nodes. A state-vector simulator may then be faster because it performs dense linear algebra directly without maintaining the decision-diagram structure.
This does not mean VQE is unsupported. QDD can run VQE-style circuits, but it is usually not the workload where QDD has the strongest advantage.
Example: Grover With Qiskit’s Grover Class
The examples in this section use optional Qiskit ecosystem packages such as
qiskit-algorithms and qiskit-nature in addition to QDD.
Qiskit Algorithms provides Grover and AmplificationProblem, so you do not
need to write the Grover iterator by hand. The example below uses Qiskit’s
PhaseOracleGate to turn a 3-SAT instance into a phase oracle, then runs the
algorithm with QDD’s Sampler.
from qiskit.circuit.library.phase_oracle import PhaseOracleGate
from qiskit.synthesis.boolean.boolean_expression import BooleanExpression
from qiskit_algorithms import AmplificationProblem, Grover
from qdd.qdd_sampler import Sampler
input_3sat_instance = """
c example DIMACS-CNF 3-SAT
p cnf 3 5
-1 -2 -3 0
1 -2 3 0
1 2 -3 0
1 -2 -3 0
-1 2 3 0
"""
expression = BooleanExpression.from_dimacs(input_3sat_instance).expression
oracle = PhaseOracleGate(expression)
problem = AmplificationProblem(oracle)
grover = Grover(sampler=Sampler())
result = grover.amplify(problem)
print(result.assignment)
This is a favorable pattern for QDD because the algorithm repeatedly applies a structured oracle and diffusion operator. The search space grows with the number of variables, but the circuit can still contain repeated subgraphs that a decision diagram can share.
The same example can use MPI by constructing the sampler with backend options
and launching the script with mpirun:
grover = Grover(sampler=Sampler(backend_options={"use_mpi": True}))
mpirun -n 4 python grover_3sat.py
This example follows the public Qiskit Algorithms Grover and 3-SAT examples.
Example: Shor From IBM Quantum’s Tutorial
Shor’s algorithm is another good match for a decision diagram simulator because its core is order finding through reversible modular arithmetic. The circuit is not necessarily small, but it is made from repeated arithmetic blocks, controlled modular multiplication, and an inverse QFT.
Unlike Grover and VQE, Qiskit documents Shor as a tutorial circuit rather than
as a single main algorithm API class. For a generally available reference
implementation, use IBM Quantum’s “Shor’s algorithm” tutorial. It constructs
the order-finding circuit for N = 15 and a = 2 using Qiskit’s standard
QFT and UnitaryGate building blocks, then runs the circuit with a Sampler
and classically post-processes the measured phase.
The QDD-facing part is the same as for any Qiskit circuit:
from qiskit import transpile
from qiskit.circuit.library import QFT, UnitaryGate
from qdd import QddProvider
# Build the order-finding circuit as in IBM Quantum's Shor tutorial:
# https://qiskit.qotlabs.org/docs/tutorials/shors-algorithm
circuit = build_order_finding_circuit_for_N15_a2()
backend = QddProvider().get_backend("qasm_simulator")
compiled = transpile(circuit, backend, optimization_level=1)
counts = backend.run(compiled, shots=1024).result().get_counts()
print(counts)
The placeholder build_order_finding_circuit_for_N15_a2() is intentionally not
a QDD-specific helper. It should be replaced by the circuit construction from
the public IBM Quantum tutorial. That keeps the example independent of this
repository’s benchmark code while documenting how to run the resulting circuit
with QDD.
Example: H2 VQE With Qiskit Nature
Qiskit Nature provides chemistry-specific components for VQE. The example
below uses its standard H2 setup: PySCFDriver builds the electronic
structure problem, JordanWignerMapper maps it to qubits, and UCCSD with a
HartreeFock initial state provides a chemically motivated ansatz.
import numpy as np
from qiskit_algorithms import VQE
from qiskit_algorithms.optimizers import SLSQP
from qiskit_nature.second_q.algorithms import GroundStateEigensolver
from qiskit_nature.second_q.circuit.library import HartreeFock, UCCSD
from qiskit_nature.second_q.drivers import PySCFDriver
from qiskit_nature.second_q.mappers import JordanWignerMapper
from qdd.qdd_estimator import Estimator
driver = PySCFDriver(atom="H 0 0 0; H 0 0 0.735", basis="sto-3g")
problem = driver.run()
mapper = JordanWignerMapper()
ansatz = UCCSD(
problem.num_spatial_orbitals,
problem.num_particles,
mapper,
initial_state=HartreeFock(
problem.num_spatial_orbitals,
problem.num_particles,
mapper,
),
)
vqe = VQE(Estimator(default_precision=0.0), ansatz, SLSQP())
vqe.initial_point = np.zeros(ansatz.num_parameters)
solver = GroundStateEigensolver(mapper, vqe)
result = solver.solve(problem)
print(result.total_energies[0])
This is a realistic and convenient way to express a molecular VQE problem, but it is also representative of workloads where QDD may be slower than a dense state-vector simulator. UCC-style and hardware-efficient ansatze contain many parameterized rotations. During an optimizer loop, each evaluation uses a different parameter vector, which often produces many distinct decision-diagram edge weights and reduces node sharing.
For this class of workloads, use a state-vector simulator as a baseline and choose QDD only when measurements show that the specific ansatz and Hamiltonian remain compact.
Practical Selection Guide
Use QDD first when:
the circuit contains repeated structured blocks;
the algorithm is search, arithmetic, or reversible logic heavy;
you expect many amplitudes or suboperators to be identical or closely reusable;
memory use is the limiting factor for a state-vector simulator.
Benchmark against a state-vector simulator when:
the circuit is dominated by arbitrary
rx,ry,rz,u, or controlled rotation gates with many distinct angles;the circuit resembles a random parameterized ansatz;
the workload is an optimizer loop that repeatedly evaluates slightly different circuits;
the decision diagram grows quickly during early test runs.
For a new workload, start with a smaller instance, compare QDD with a state-vector backend, and scale only if the QDD run shows the expected compact representation advantage.
Benchmark Script
The repository includes bench/compare_simulators.py to compare QDD with
Qiskit Aer statevector and matrix-product-state simulation on the example
workloads above:
python bench/compare_simulators.py --benchmark grover --grover-qubits 20
python bench/compare_simulators.py --benchmark shor --counting-qubits 22
python bench/compare_simulators.py --benchmark vqe --molecule h4 --vqe-reps 8
python bench/compare_simulators.py --benchmark all
The VQE benchmark builds molecular Hamiltonians with Qiskit Nature. By default it
uses a deeper EfficientSU2 ansatz to make the small-angle-rotation workload
large enough to time; pass --vqe-ansatz uccsd to use the chemistry-specific
UCCSD ansatz instead. Install the project with the test extra before running
the benchmark:
pip install .[test]
The defaults are intentionally larger than smoke tests so the timing
differences are visible. For a quick functional check, reduce the scale, for
example --grover-qubits 12, --counting-qubits 8, or --vqe-steps 1.
The VQE benchmark is shot based: each Pauli term is measured with
--vqe-shots shots for each optimizer evaluation.
The output also includes simple correctness checks: Grover reports the measured
probability of the marked prefix, Shor reports the measured probability of
recovering order 4 for N = 15, and VQE reports the sampled energy difference
from Aer statevector.
Example Timing Results
The following measurements were taken on one development environment with
.venv_qdd/bin/python bench/compare_simulators.py. They are intended as a
representative comparison, not as portable absolute performance numbers.
Grover prefix search used 20 qubits, 256 shots, marked prefix 101010101010,
and the automatically selected Grover iteration count. The correctness metric
is the measured probability that the sampled bitstring starts with the marked
prefix.
Simulator |
Transpile [s] |
Run [s] |
Total [s] |
Correctness |
|---|---|---|---|---|
Aer statevector |
0.080887 |
7.784454 |
7.865341 |
prefix success 1.000 |
Aer MPS |
1.398227 |
12.111329 |
13.509557 |
prefix success 1.000 |
QDD |
0.022978 |
0.109171 |
0.132149 |
prefix success 1.000 |
This result matches the expected QDD behavior: the Grover oracle and diffusion operator are highly structured, and the state remains compact enough for the decision diagram representation to share nodes effectively.
Shor order finding used N = 15, base a = 2, 22 counting qubits, and 256
shots. The correctness metric is the probability that classical
post-processing recovers the expected order 4.
Simulator |
Transpile [s] |
Run [s] |
Total [s] |
Correctness |
|---|---|---|---|---|
Aer statevector |
0.087232 |
9.524342 |
9.611574 |
order-4 success 0.512 |
Aer MPS |
0.064690 |
0.100180 |
0.164869 |
order-4 success 0.508 |
QDD |
0.039742 |
0.197238 |
0.236979 |
order-4 success 0.469 |
This also broadly matches the expected picture for arithmetic circuits, but it
shows an important nuance: MPS can be very competitive when the particular
arithmetic circuit has low effective entanglement in the chosen qubit order.
QDD remains much faster than Aer statevector in this run, while MPS is fastest
for this small N = 15 order-finding instance.
VQE used the H4 Hamiltonian from Qiskit Nature, EfficientSU2(reps=8), one
optimizer evaluation, and 128 shots per Pauli term. The correctness metric is
reported as the sampled energy difference from Aer statevector, using the same
initial parameters and shot count.
Simulator |
Transpile [s] |
Run [s] |
Total [s] |
Energy Delta vs Aer Statevector |
|---|---|---|---|---|
Aer statevector |
2.635268 |
1.146089 |
3.781358 |
+0.000000 |
Aer MPS |
2.710340 |
5.776522 |
8.486862 |
+0.019080 |
QDD |
2.356933 |
1.914973 |
4.271906 |
-0.023605 |
This VQE result needs careful interpretation. It is consistent with the
guidance above when QDD is compared with dense statevector simulation: QDD is
slower than Aer statevector on this small-angle-rotation workload. However, QDD
is faster than Aer MPS in this specific run. That does not mean VQE is
generally favorable for QDD. The EfficientSU2 ansatz uses full entanglement,
which is often unfavorable for MPS because MPS performance depends strongly on
low entanglement and a good one-dimensional qubit ordering. In contrast, this
particular VQE instance is still small enough that QDD’s overhead does not
dominate as much as it would for larger or more random parameterized circuits.
For VQE-style workloads, treat Aer statevector as the primary baseline and
measure the specific ansatz before choosing QDD.
References
Qiskit Algorithms: Grover’s algorithm examples, https://qiskit-community.github.io/qiskit-algorithms/tutorials/07_grover_examples.html
Qiskit Nature: getting started with H2 VQE, https://qiskit-community.github.io/qiskit-nature/getting_started.html
IBM Quantum Documentation: Shor’s algorithm, https://qiskit.qotlabs.org/docs/tutorials/shors-algorithm