# Performance Characteristics

QDD is a decision-diagram-based quantum circuit simulator. Its performance is
best when the simulated state and operators have enough repeated structure for
the decision diagram to share nodes. It can be slower when the circuit quickly
creates many distinct amplitudes or many distinct edge weights, because the
decision diagram then loses sharing and approaches the cost of an explicit
state-vector representation with additional bookkeeping overhead.

This page gives practical guidance for choosing workloads where QDD is likely
to work well.

## When QDD Tends To Be Fast

QDD is well suited to circuits whose state evolution remains compact as a
decision diagram:

- Grover-style search circuits with repeated oracle and diffusion blocks.
- Shor-style arithmetic circuits made from reversible modular arithmetic,
  controlled operations, and repeated structured subcircuits.
- Circuits with many Clifford, permutation, controlled-NOT, Toffoli, and other
  operations that preserve regular structure.
- Problems where a large Hilbert space is explored, but the amplitudes still
  fall into repeated patterns.

Grover and Shor circuits often have these properties. They may use many qubits
and many gates, but the circuit structure is highly regular, and intermediate
states often contain repeated subgraphs that a decision diagram can represent
compactly.

## When QDD Can Be Slow

QDD is less favorable for circuits that create many unrelated amplitudes:

- VQE circuits with hardware-efficient ansatze such as `EfficientSU2` or
  `RealAmplitudes` at large depth.
- Circuits with many independent small-angle rotations.
- Random or problem-instance-specific parameterized circuits with little
  repeated structure.
- Optimization loops that evaluate thousands of slightly different parameter
  points.

In these cases, each parameter value can introduce distinct edge weights. When
those weights are all different, the decision diagram cannot merge as many
nodes. A state-vector simulator may then be faster because it performs dense
linear algebra directly without maintaining the decision-diagram structure.

This does not mean VQE is unsupported. QDD can run VQE-style circuits, but it is
usually not the workload where QDD has the strongest advantage.

## Example: Grover With Qiskit's Grover Class

The examples in this section use optional Qiskit ecosystem packages such as
`qiskit-algorithms` and `qiskit-nature` in addition to QDD.

Qiskit Algorithms provides `Grover` and `AmplificationProblem`, so you do not
need to write the Grover iterator by hand. The example below uses Qiskit's
`PhaseOracleGate` to turn a 3-SAT instance into a phase oracle, then runs the
algorithm with QDD's Sampler.

```python
from qiskit.circuit.library.phase_oracle import PhaseOracleGate
from qiskit.synthesis.boolean.boolean_expression import BooleanExpression
from qiskit_algorithms import AmplificationProblem, Grover

from qdd.qdd_sampler import Sampler


input_3sat_instance = """
c example DIMACS-CNF 3-SAT
p cnf 3 5
-1 -2 -3 0
1 -2 3 0
1 2 -3 0
1 -2 -3 0
-1 2 3 0
"""

expression = BooleanExpression.from_dimacs(input_3sat_instance).expression
oracle = PhaseOracleGate(expression)
problem = AmplificationProblem(oracle)

grover = Grover(sampler=Sampler())
result = grover.amplify(problem)

print(result.assignment)
```

This is a favorable pattern for QDD because the algorithm repeatedly applies a
structured oracle and diffusion operator. The search space grows with the
number of variables, but the circuit can still contain repeated subgraphs that
a decision diagram can share.

The same example can use MPI by constructing the sampler with backend options
and launching the script with `mpirun`:

```python
grover = Grover(sampler=Sampler(backend_options={"use_mpi": True}))
```

```sh
mpirun -n 4 python grover_3sat.py
```

This example follows the public Qiskit Algorithms Grover and 3-SAT examples.

## Example: Shor From IBM Quantum's Tutorial

Shor's algorithm is another good match for a decision diagram simulator because
its core is order finding through reversible modular arithmetic. The circuit is
not necessarily small, but it is made from repeated arithmetic blocks,
controlled modular multiplication, and an inverse QFT.

Unlike Grover and VQE, Qiskit documents Shor as a tutorial circuit rather than
as a single main algorithm API class. For a generally available reference
implementation, use IBM Quantum's "Shor's algorithm" tutorial. It constructs
the order-finding circuit for `N = 15` and `a = 2` using Qiskit's standard
`QFT` and `UnitaryGate` building blocks, then runs the circuit with a Sampler
and classically post-processes the measured phase.

The QDD-facing part is the same as for any Qiskit circuit:

```python
from qiskit import transpile
from qiskit.circuit.library import QFT, UnitaryGate

from qdd import QddProvider

# Build the order-finding circuit as in IBM Quantum's Shor tutorial:
# https://qiskit.qotlabs.org/docs/tutorials/shors-algorithm
circuit = build_order_finding_circuit_for_N15_a2()

backend = QddProvider().get_backend("qasm_simulator")
compiled = transpile(circuit, backend, optimization_level=1)
counts = backend.run(compiled, shots=1024).result().get_counts()
print(counts)
```

The placeholder `build_order_finding_circuit_for_N15_a2()` is intentionally not
a QDD-specific helper. It should be replaced by the circuit construction from
the public IBM Quantum tutorial. That keeps the example independent of this
repository's benchmark code while documenting how to run the resulting circuit
with QDD.

## Example: H2 VQE With Qiskit Nature

Qiskit Nature provides chemistry-specific components for VQE. The example
below uses its standard H2 setup: `PySCFDriver` builds the electronic
structure problem, `JordanWignerMapper` maps it to qubits, and `UCCSD` with a
`HartreeFock` initial state provides a chemically motivated ansatz.

```python
import numpy as np
from qiskit_algorithms import VQE
from qiskit_algorithms.optimizers import SLSQP
from qiskit_nature.second_q.algorithms import GroundStateEigensolver
from qiskit_nature.second_q.circuit.library import HartreeFock, UCCSD
from qiskit_nature.second_q.drivers import PySCFDriver
from qiskit_nature.second_q.mappers import JordanWignerMapper

from qdd.qdd_estimator import Estimator


driver = PySCFDriver(atom="H 0 0 0; H 0 0 0.735", basis="sto-3g")
problem = driver.run()

mapper = JordanWignerMapper()

ansatz = UCCSD(
    problem.num_spatial_orbitals,
    problem.num_particles,
    mapper,
    initial_state=HartreeFock(
        problem.num_spatial_orbitals,
        problem.num_particles,
        mapper,
    ),
)

vqe = VQE(Estimator(default_precision=0.0), ansatz, SLSQP())
vqe.initial_point = np.zeros(ansatz.num_parameters)

solver = GroundStateEigensolver(mapper, vqe)
result = solver.solve(problem)

print(result.total_energies[0])
```

This is a realistic and convenient way to express a molecular VQE problem, but
it is also representative of workloads where QDD may be slower than a dense
state-vector simulator. UCC-style and hardware-efficient ansatze contain many
parameterized rotations. During an optimizer loop, each evaluation uses a
different parameter vector, which often produces many distinct
decision-diagram edge weights and reduces node sharing.

For this class of workloads, use a state-vector simulator as a baseline and
choose QDD only when measurements show that the specific ansatz and Hamiltonian
remain compact.

## Practical Selection Guide

Use QDD first when:

- the circuit contains repeated structured blocks;
- the algorithm is search, arithmetic, or reversible logic heavy;
- you expect many amplitudes or suboperators to be identical or closely
  reusable;
- memory use is the limiting factor for a state-vector simulator.

Benchmark against a state-vector simulator when:

- the circuit is dominated by arbitrary `rx`, `ry`, `rz`, `u`, or controlled
  rotation gates with many distinct angles;
- the circuit resembles a random parameterized ansatz;
- the workload is an optimizer loop that repeatedly evaluates slightly
  different circuits;
- the decision diagram grows quickly during early test runs.

For a new workload, start with a smaller instance, compare QDD with a
state-vector backend, and scale only if the QDD run shows the expected compact
representation advantage.

## Benchmark Script

The repository includes `bench/compare_simulators.py` to compare QDD with
Qiskit Aer statevector and matrix-product-state simulation on the example
workloads above:

```sh
python bench/compare_simulators.py --benchmark grover --grover-qubits 20
python bench/compare_simulators.py --benchmark shor --counting-qubits 22
python bench/compare_simulators.py --benchmark vqe --molecule h4 --vqe-reps 8
python bench/compare_simulators.py --benchmark all
```

The VQE benchmark builds molecular Hamiltonians with Qiskit Nature. By default it
uses a deeper `EfficientSU2` ansatz to make the small-angle-rotation workload
large enough to time; pass `--vqe-ansatz uccsd` to use the chemistry-specific
UCCSD ansatz instead. Install the project with the `test` extra before running
the benchmark:

```sh
pip install .[test]
```

The defaults are intentionally larger than smoke tests so the timing
differences are visible. For a quick functional check, reduce the scale, for
example `--grover-qubits 12`, `--counting-qubits 8`, or `--vqe-steps 1`.
The VQE benchmark is shot based: each Pauli term is measured with
`--vqe-shots` shots for each optimizer evaluation.
The output also includes simple correctness checks: Grover reports the measured
probability of the marked prefix, Shor reports the measured probability of
recovering order 4 for `N = 15`, and VQE reports the sampled energy difference
from Aer statevector.

### Example Timing Results

The following measurements were taken on one development environment with
`.venv_qdd/bin/python bench/compare_simulators.py`. They are intended as a
representative comparison, not as portable absolute performance numbers.

Grover prefix search used 20 qubits, 256 shots, marked prefix `101010101010`,
and the automatically selected Grover iteration count. The correctness metric
is the measured probability that the sampled bitstring starts with the marked
prefix.

| Simulator | Transpile [s] | Run [s] | Total [s] | Correctness |
| --- | ---: | ---: | ---: | ---: |
| Aer statevector | 0.080887 | 7.784454 | 7.865341 | prefix success 1.000 |
| Aer MPS | 1.398227 | 12.111329 | 13.509557 | prefix success 1.000 |
| QDD | 0.022978 | 0.109171 | 0.132149 | prefix success 1.000 |

This result matches the expected QDD behavior: the Grover oracle and diffusion
operator are highly structured, and the state remains compact enough for the
decision diagram representation to share nodes effectively.

Shor order finding used `N = 15`, base `a = 2`, 22 counting qubits, and 256
shots. The correctness metric is the probability that classical
post-processing recovers the expected order 4.

| Simulator | Transpile [s] | Run [s] | Total [s] | Correctness |
| --- | ---: | ---: | ---: | ---: |
| Aer statevector | 0.087232 | 9.524342 | 9.611574 | order-4 success 0.512 |
| Aer MPS | 0.064690 | 0.100180 | 0.164869 | order-4 success 0.508 |
| QDD | 0.039742 | 0.197238 | 0.236979 | order-4 success 0.469 |

This also broadly matches the expected picture for arithmetic circuits, but it
shows an important nuance: MPS can be very competitive when the particular
arithmetic circuit has low effective entanglement in the chosen qubit order.
QDD remains much faster than Aer statevector in this run, while MPS is fastest
for this small `N = 15` order-finding instance.

VQE used the H4 Hamiltonian from Qiskit Nature, `EfficientSU2(reps=8)`, one
optimizer evaluation, and 128 shots per Pauli term. The correctness metric is
reported as the sampled energy difference from Aer statevector, using the same
initial parameters and shot count.

| Simulator | Transpile [s] | Run [s] | Total [s] | Energy Delta vs Aer Statevector |
| --- | ---: | ---: | ---: | ---: |
| Aer statevector | 2.635268 | 1.146089 | 3.781358 | +0.000000 |
| Aer MPS | 2.710340 | 5.776522 | 8.486862 | +0.019080 |
| QDD | 2.356933 | 1.914973 | 4.271906 | -0.023605 |

This VQE result needs careful interpretation. It is consistent with the
guidance above when QDD is compared with dense statevector simulation: QDD is
slower than Aer statevector on this small-angle-rotation workload. However, QDD
is faster than Aer MPS in this specific run. That does not mean VQE is
generally favorable for QDD. The `EfficientSU2` ansatz uses full entanglement,
which is often unfavorable for MPS because MPS performance depends strongly on
low entanglement and a good one-dimensional qubit ordering. In contrast, this
particular VQE instance is still small enough that QDD's overhead does not
dominate as much as it would for larger or more random parameterized circuits.
For VQE-style workloads, treat Aer statevector as the primary baseline and
measure the specific ansatz before choosing QDD.

## References

- Qiskit Algorithms: Grover's algorithm examples,
  https://qiskit-community.github.io/qiskit-algorithms/tutorials/07_grover_examples.html
- Qiskit Nature: getting started with H2 VQE,
  https://qiskit-community.github.io/qiskit-nature/getting_started.html
- IBM Quantum Documentation: Shor's algorithm,
  https://qiskit.qotlabs.org/docs/tutorials/shors-algorithm