Usage of sample applications

October 10, 2025 · View on GitHub

qsim and qsimh are designed to be extensible to a variety of different applications. The base versions of each are qsim_base and qsimh_base; sample extensions are provided in apps. To compile the code, just run make qsim. Binaries of the form qsim(h)_*.x will be added to the apps directory.

Sample circuits are provided in circuits.

qsim_base usage

./qsim_base.x -c circuit_file -d maxtime -t num_threads -f max_fused_size -v verbosity -z
FlagDescription
-c circuit_filecircuit file to run
-d maxtimemaximum time
-t num_threadsnumber of threads to use
-f max_fused_sizemaximum fused gate size
-v verbosityverbosity level (0,1,2,3,4,5)
-zset flush-to-zero and denormals-are-zeros MXCSR control flags

qsim_base computes all the amplitudes and just prints the first eight of them (or a smaller number for 1- or 2-qubit circuits).

Verbosity levels are described in the following table.

Verbosity levelDescription
0no additional information
1add total simulation runtime
2add initialization runtime and fuser runtime
3add basic fuser statistics
4add simulation runtime for each fused gate
5additional fuser information (qubit indices for each fused gate)

Example:

./qsim_base.x -c ../circuits/circuit_q24 -d 16 -t 8 -v 1

qsim_von_neumann usage

./qsim_von_neumann.x -c circuit_file -d maxtime -t num_threads -f max_fused_size -v verbosity -z
FlagDescription
-c circuit_filecircuit file to run
-d maxtimemaximum time
-t num_threadsnumber of threads to use
-f max_fused_sizemaximum fused gate size
-v verbosityverbosity level (0,1,2,3,4,5)
-zset flush-to-zero and denormals-are-zeros MXCSR control flags

qsim_von_neumann computes all the amplitudes and calculates the von Neumann entropy. Note that this can be quite slow for large circuits and small thread numbers as the calculation of logarithms is slow.

Example:

./qsim_von_neumann.x -c ../circuits/circuit_q24 -d 16 -t 4 -v 1

qsim_amplitudes usage

./qsim_amplitudes.x -c circuit_file \
                    -d times_to_save_results \
                    -i input_files \
                    -o output_files \
                    -f max_fused_size \
                    -t num_threads -v verbosity -z
FlagDescription
-c circuit_filecircuit file to run
-d times_to_save_resultscomma-separated list of circuit times to save results at
-i input_filescomma-separated list of bitstring input files
-o output_filescomma-separated list of amplitude output files
-t num_threadsnumber of threads to use
-f max_fused_sizemaximum fused gate size
-v verbosityverbosity level (0,1,2,3,4,5)
-zset flush-to-zero and denormals-are-zeros MXCSR control flags

qsim_amplitudes reads input files of bitstrings, computes the corresponding amplitudes at specified times and writes them to output files.

Bitstring files should contain bitstings (one bitstring per line) in text format.

Example:

./qsim_amplitudes.x -c ../circuits/circuit_q24 -t 4 -d 16,24 -i ../circuits/bitstrings_q24_s1,../circuits/bitstrings_q24_s2 -o ampl_q24_s1,ampl_q24_s2 -v 1

qsim_qtrajectory_cuda usage

./qsim_qtrajectory_cuda.x -c circuit_file \
                          -d times_to_calculate_observables \
                          -a amplitude_damping_const \
                          -p phase_damping_const \
                          -t traj0 -n num_trajectories \
                          -f max_fused_size \
                          -v verbosity
FlagDescription
-c circuit_filecircuit file to run
-d times_to_calculate_observablescomma-separated list of circuit times to calculate observables at
-a amplitude_damping_constamplitude damping constant
-p phase_damping_constphase damping constant
-t traj0starting trajectory
-n num_trajectories number of trajectories to run starting with traj0
-f max_fused_sizemaximum fused gate size
-v verbosityverbosity level (0,1,2,3,4,5)

qsim_qtrajectory_cuda runs on GPUs. qsim_qtrajectory_cuda performs quantum trajactory simulations with amplitude damping and phase damping noise channels. qsim_qtrajectory_cuda calculates observables (operator X at each qubit) at specified times.

Example:

./qsim_qtrajectory_cuda.x -c ../circuits/circuit_q24 -d 8,16,32 -a 0.005 -p 0.005 -t 0 -n 100 -f 4 -v 0

qsimh_base usage

./qsimh_base.x -c circuit_file \
               -d maxtime \
               -k part1_qubits \
               -w prefix \
               -p num_prefix_gates \
               -r num_root_gates \
               -t num_threads -v verbosity -z
FlagDescription
-c circuit_filecircuit file to run
-d maxtimemaximum time
-k part1_qubitscomma-separated list of qubit indices for part 1
-w prefixprefix value
-p num_prefix_gatesnumber of prefix gates
-r num_root_gatesnumber of root gates
-t num_threadsnumber of threads to use
-v verbosityverbosity level (0,1,4,5)
-zset flush-to-zero and denormals-are-zeros MXCSR control flags

qsimh_base just computes and just prints the first eight amplitudes. The hybrid Schrödinger-Feynman method is used. The lattice is split into two parts. A two level checkpointing scheme is used to improve performance. Say, there are N gates on the cut. We split those into three parts: p+r+s=N, where p is the number of "prefix" gates, r is the number of "root" gates and s is the number of "suffix" gates. The first checkpoint is executed after applying all the gates up to and including the prefix gates and the second checkpoint is executed after applying all the gates up to and including the root gates. The full summation over all the paths for the root and suffix gates is performed.

The path for the prefix gates is specified by prefix. It is just a value of bit-shifted path indices in the order of occurrence of prefix gates in the circuit file. This is primarily used for distributed execution - see the Distributed execution section below for more details.

Example (running on one machine):

./qsimh_base.x -c ../circuits/circuit_q30 -d 16 \
               -k 0,1,2,6,7,8,12,13,14,18,19,20,24,25,26 \
               -t 8 -w 0 -p 0 -r 5 -v 1

Choosing flag values for qsimh

-k defines how the lattice will be split up. In the examples above, the lattice has the structure below (cuts are denoted by the | symbol):

 0    1    2 |  3    4    5

 6    7    8 |  9   10   11

12   13   14 | 15   16   17

18   19   20 | 21   22   23

24   25   26 | 27   28   29

Deciding which cuts are optimal for a given circuit is computationally hard. However, splitting the grid into roughly equal parts with the fewest cuts possible (as is done for the lattice above) produces a circuit that performs reasonably well in most cases.

The runtime of an execution is heavily influenced by -p, as there is no summation over the "prefix" gates. The unique "prefix" path is specified by -w; see the "Distributed execution" section below for details on this.

-r implicitly specifies the number of the "suffix" gates: the total number of gates on the cut minus the values specified by -p and -r. For performance, the "suffix" gates should typically be the gates on the cut with maximum "time".

qsimh_amplitudes usage

./qsimh_amplitudes.x -c circuit_file \
                     -d maxtime \
                     -k part1_qubits \
                     -w prefix \
                     -p num_prefix_gates \
                     -r num_root_gates \
                     -i input_file -o output_file \
                     -t num_threads -v verbosity -z
FlagDescription
-c circuit_filecircuit file to run
-d maxtimemaximum time
-k part1_qubitscomma-separated list of qubit indices for part 1
-w prefixprefix value
-p num_prefix_gatesnumber of prefix gates
-r num_root_gatesnumber of root gates
-i input_filebitstring input file
-o output_fileamplitude output file
-t num_threadsnumber of threads to use
-v verbosityverbosity level (0,1,4,5)
-zset flush-to-zero and denormals-are-zeros MXCSR control flags

qsimh_amplitudes reads the input file of bitstrings, computes the corresponding amplitudes and writes them to the output file. The hybrid Schrödinger-Feynman method is used, see above.

Bitstring files should contain bitstrings (one bitstring per line) in text format.

Example (do not execute - see below):

./qsimh_amplitudes.x -c ../circuits/circuit_q40 -d 47 -k 0,1,2,3,4,5,6,7,8,9,10,13,14,15,16,17,23,24 -t 8 -w 0 -p 0 -r 13 -i ../circuits/bitstrings_q40_s1 -o ampl_q40_s1 -v 1

This command could take weeks to run, since parallelism on a single machine is limited by the -t flag and the available cores on the device. For large circuits like this, distributed execution is recommended.

Distributed execution

By setting -p to be greater than zero, the workload of qsimh_amplitudes can be distributed across multiple machines. Each machine should use the same arguments to ./qsimh_amplitudes.x, with the exception of the -w flag, which specifies the path that machine will evaluate.

Example:

# Machine 1
./qsimh_amplitudes.x -c ../circuits/circuit_q40 -d 47 -k 0,1,2,3,4,5,6,7,8,9,10,13,14,15,16,17,23,24 -t 8 -w 0 -p 9 -r 4 -i ../circuits/bitstrings_q40_s1 -o ampl_q40_s1_w0 -v 1

# Machine 2
./qsimh_amplitudes.x -c ../circuits/circuit_q40 -d 47 -k 0,1,2,3,4,5,6,7,8,9,10,13,14,15,16,17,23,24 -t 8 -w 1 -p 9 -r 4 -i ../circuits/bitstrings_q40_s1 -o ampl_q40_s1_w1 -v 1

# ...additional executions...

Each execution above computes a portion of the overall amplitude for the specified bitstrings. Summing across these results will give the final amplitudes, with fidelity dependent on the number of paths executed.