Cyclic-banded-matrix

June 14, 2026 · View on GitHub

Cyclic-banded-matrix is fortran 90+ source code for solving a system Ax=b with A represented by a cyclic band matrix, aka periodic band matrix, or cyclic/periodic banded matrix. These are all square matrices, and arise from problems with periodic boundary conditions.

The algorithm is detailed in band_matrix.pdf. I have not found any previous version of this algorithm and it is entirely my derivation. If I have reinvented the wheel, there being nothing new under the sun, it would be interesting to see the previous attribution.

The error rate increases more rapidly with N than an LU decomposition, but is O(N*KU) in speed. The matrix needs to be diagonally dominant.

Usage

Two routines are included, dctsv.f90 which solves periodic tridiagonal systems, and dcbsv.f90, which solves general banded periodic matrices. They are written to be reasonably easily incorporated into code using LAPACK (Univ. of Tennessee, Univ. of California Berkeley and NAG Ltd.November 2006) and the interfaces have been deliberately made as compatible as possible with their non-cyclic LAPACK counterparts dgbsv.f and dgtsv.f

Both use XERBLA error reporting and require linking with LAPACK & BLAS. Removing XERBLA from dctsv.f90 removes the LAPACK/BLAS dependency.

Removing the LAPACK/BLAS dependency in dcbsv requires substitution by intrinsic Fortran array operators instead of LAPACK & BLAS calls (done already by default) and substitution of LAPACK's dgesv.f & dgetr(fsi).f by another suitable general matrix solver (e.g. the included admittedly slower O(N^3) gauss-jordan.f90), not done by default. The intrinsic operators and calls to the alternate matrix solver are immediately after their respective commented out LAPACK calls, making it easy to switch.

The algorithm can be run forwards or backwards, with dcbsv_forward.f90, dcbsv_reverse.f90, or both simultaneously using OpenMP with dcbsv_parallel_omp.f90. The parallel OMP versions can be nearly twice as fast as the non parallel versions depending on the parameters. There is also a 4-way parallel solver dcbsv_4x4parallel_omp.f90, which might be faster in certain problems, but the speed of the algorithm through the loops is offset by the complexity of the setup of the matrices.

A fortran 90 module LapackInterface.f90 is included for compatibility with FORTRAN 77. Note that the module only needs to be compiled once and is only needed for LAPACK/BLAS compatibility.

Band matrices are defined per LAPACK convention https://www.netlib.org/lapack/lug/node124.html for non-periodic matrices where the columns of the full matrix are the columns of the banded form while the rows of the banded form are the diagonals of the full matrix. For periodic banded matrices, the definition of the banded form is similar with the rows of the full matrix as the columns of the banded form while the rows of the banded form are the diagonals of the full matrix. See band_matrix.pdf

The band diagonal storage scheme is illustrated by the following example, when N = 9, KU = 2

 AB(2*KU+2-mod(N+KU+1+i-j,N),i)                  A(i,j)=aij  
 a18  a29  a31  a42  a53 a64 a75 a86 a97         a11 a12 a13  0   0   0   0 a18  a19
 a19  a21  a32  a43  a54 a65 a76 a87 a98         a21 a22 a23 a24  0   0   0  0   a29
 a11  a22  a33  a44  a55 a66 a77 a88 a99         a31 a32 a33 a34 a35  0   0  0    0
 a12  a23  a34  a45  a56 a67 a78 a89 a91         0   a42 a43 a44 a45 a46  0  0    0
 a13  a24  a35  a46  a57 a68 a79 a81 a92         0   0   a53 a54 a55 a56 a57 0    0
                                                 0   0   0   a64 a65 a66 a67 a68  0
                                                 0   0   0   0   a75 a76 a77 a78 a79
                                                 a81 0   0   0    0  a86 a87 a88 a89
                                                 a91 a92 0   0    0   0  a97 a98 a99

A fortran 90 module AB_matrix_fct.f90 to work with the defined periodic banded matrix and their non-cyclic LAPACK counterpart is included. Note that the modules only need to be compiled once. The main loops of the algorithms are included as forward_loop.f90 and backward_loop.f90.

Two simple test programs are included, testme.f90 using full matrix routines as well as the LAPACK banded noncyclic routines dgtsv and dgbsv for comparison and testme_omp.f90 for larger N with a simple tridiagonal solver thomas.f90 for comparison purposes. CityPlots https://math.nist.gov/MatrixMarket/ of the matrices can be generated with the included Mathematica(R) notebook CityPlot.nb

cmake is the easiest way to make the test programs, the default is to make Linux dynamically linked execuatbles, with the added benefit that if you want to generate static Windows(R)/MingW executables, it is easy to edit CMakeLists.txt, uncomment/changing the lines that say this:

#set(CMAKE_HOST_NAME Windows)
#set(CMAKE_SYSTEM Windows)
set(CMAKE_HOST_NAME Linux)
set(CMAKE_SYSTEM UNIX) 
to this:
#set(CMAKE_HOST_NAME Windows)
set(CMAKE_SYSTEM Windows)
#set(CMAKE_HOST_NAME Linux)
#set(CMAKE_SYSTEM UNIX)

for cross-compiling support. CMakeLists.txt assumes you've got the cross-compiling tools under /usr/x86_64-w64-mingw32, and you've already compiled static versions of OpenMP, LAPACK and BLAS, in /usr/lib. Obviously if you want a static version under Linux, or a dynamic version under MingW, changes will have to be made in CMakeLists.txt

To build under Windows, in addition, uncomment set(CMAKE_HOST_NAME Windows)

After that, under Linux, run

  cmake ./    
  make  
  Executables:
  ./testme_cyclic (or testme_cyclic.exe)
  ./testme_cyclic_omp (or testme_cyclic_omp.exe)

You can also use an IDE to open the CMakeList.txt file. I have used Qt successfully for this, and in fact, that is easiest particularly under Windows as Qt automatically can configure the toolchain.

Alternatively, you can do it manually without cmake (some examples below, only for dynamic Linux build)

non-parallel version, gfortran instructions:

gfortran -c LapackInterface.f90 AB_matrix_fct.f90
gfortran testme.f90 dctsv.f90 dcbsv_forward.f90 dcbsv_reverse.f90 forward_loop.f90 backward_loop.f90 gauss-jordan.f90 -llapack -lblas

parallel versions with OpenMP, different compilers:

gfortran -fopenmp -O3 testme_omp.f90 dcbsv_4x4parallel_omp.f90 dcbsv_parallel_omp.f90 dcbsv_forward.f90 dcbsv_reverse.f90 dctsv_parallel_omp.f90 forward_loop.f90 backward_loop.f90 thomas.f90 LapackInterface.f90 AB_matrix_fct.f90 -llapack -lblas

ifort -qopenmp -shared-intel testme_omp.f90 dcbsv_parallel_omp.f90 dcbsv_4x4parallel_omp.f90 dcbsv_forward.f90 dcbsv_reverse.f90 dctsv_parallel_omp.f90 forward_loop.f90 backward_loop.f90 thomas.f90 LapackInterface.f90 AB_matrix_fct.f90 gauss-jordan.f90 -llapack -lblas

nvfortran -mp testme_omp.f90 dcbsv_parallel_omp.f90 dcbsv_4x4parallel_omp.f90 dcbsv_forward.f90 dcbsv_reverse.f90 dctsv_parallel_omp.f90 forward_loop.f90 backward_loop.f90 thomas.f90 LapackInterface.f90 AB_matrix_fct.f90 gauss-jordan.f90 -llapack -lblas

all of the above three exmples result in a generic executable ./a.out

gfortran is used as the default compiler with cmake, but ifort or nvfortran are easily configured.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

LICENSE