Performance benchmarks using the Rust Scientific Library (Russell)

June 24, 2026 · View on GitHub

Contents

Introduction

This project contains some performance benchmarks using the Rust Scientific Library (Russell).

Currently, we present some results involving the sparse linear solvers from russell_sparse.

The computations presented here use all features (intel_mkl, local_sparse, and cudss). Thus, UMFPACK and MUMPS are compiled locally with Intel MKL. On the other hand, the Linux binary from NVIDIA cuDSS is employed for the calculations with cuDSS. See the system and libraries information used in the benchmarks.

Solvers for large sparse linear systems

Note that NVIDIA cuDSS is in Preview Mode 🚧.

The results are available in PDF format and also listed below.

We test the linear system Ax = b where A is the coefficient matrix, x is the solution vector, and b is the right-hand side vector. The coefficient matrix is set with matrices from the SuiteSparse Matrix Collection. The right-hand side vector is filled with ones, i.e., we study the solution of

A x = 1

The relative error is calculated as

RelativeError = max(|A x - 1|) / max(|A| + 1)

The real-valued tested matrices are:

  1. bwm2000 (Bai) -- Brusselator wave model in transport interaction of chemical solutions (1992)
  2. rdb5000 (Bai) -- Reaction-diffusion brusselator model (1994)
  3. Goodwin_040 (Goodwin) -- Finite element, Navier-Stokes & other transport equations (2018)
  4. fp (MKS) -- 2-D Fokker Planck eqn, electron dyn. in external field (2006)
  5. xenon1 (Ronis) -- Complex zeolite, sodalite crystals (2001)
  6. twotone (ATandT) -- Harmonic balance method (2001)
  7. Raj1 (Rajat) -- Circuit Simulation Problem (2007)
  8. boyd2 (GHS_indef) -- Optimization problem (2004)
  9. Goodwin_071 (Goodwin) -- Finite element, Navier-Stokes & other transport equations (2018)
  10. darcy003 (GHS_indef) -- Discretization using mixed FE of Darcy (2002)
  11. rma10 (Bova) -- 3D CFD model, Charleston harbor (1997)
  12. helm2d03 (GHS_indef) -- Helmholtz eq on a unit square (2004)
  13. stomach (Norris) -- 3D electro-physical model of a duodenum (2003)
  14. oilpan (GHS_psdef) -- Structural problem (2004)
  15. ASIC_680k (Sandia) -- Circuit simulation matrix (2006)
  16. tmt_unsym (CEMW) -- Electromagnetics problem (2008)
  17. Goodwin_127 (Goodwin) -- Finite element, Navier-Stokes & other transport equations (2018)
  18. pre2 (ATandT) -- Harmonic balance method (2001)
  19. marine1 (Martin) -- Chemical oceanography; a marine nitrogen cycle inverse model (2018)
  20. torso1 (Norris) -- Finite differences and boundary element, 2D model of torso (2003)
  21. atmosmodd (Bourchtein) -- CFD analysis of atmospheric models (2009)
  22. atmosmodl (Bourchtein) -- CFD analysis of atmospheric models (2009)
  23. memchip (Freescale) -- Circuit simulation problem (2010)
  24. Freescale1 (Freescale) -- Circuit simulation problem (2008)
  25. rajat31 (Rajat) -- Circuit simulation problem (2006)
  26. Transport (Janna) -- 3D finite element flow and transport (2012)
  27. inline_1 (GHS_psdef) -- Structural problem, stiffness matrix (2004)
  28. PFlow_742 (Janna) -- 3D pressure-temperature evolution in porous media (2014)
  29. Emilia_923 (Janna) -- Geomechanical model for C02 sequestration (2011)
  30. dielFilterV2real (Dziekonski) -- FEM in electromagnetics (2011)
  31. Flan_1565 (Janna) -- Structural problem, 3D model of a steel flange (2011)
  32. pres-cylin (Pedroso) -- FEM stiffness matrix of a pressurized cylinder (Tet10 with 1,711,464 DOF). Not from the Collection. From Pedroso DM (2024) Caveats of three direct linear solvers for finite element analyses.

The complex-valued tested matrices are:

  1. mhd1280b (Bai) -- Alfven spectra in magnetohydrodynamics (1994)
  2. mplate (Cote) -- Vibro-acoustic problem (1997)
  3. RFdevice (Rost) -- Semiconductor device simulation (2007)
  4. vfem (CEMW) -- Electromagnetics, vector finite element (2008)
  5. fem_filter (Lee) -- FEM band-pass microwave filter 500MHz (2008)
  6. Chevron4 (Chevron) -- Temporal freq domain seismic modeling (2012)
  7. mono_500Hz (FreeFieldTechnologies) -- 3D vibro-acoustic problem, aircraft engine nacelle (2008)
  8. kim2 (Kim) -- 2D 676-by-676 complex mesh (2002)
  9. fem_hifreq_circuit (Lee) -- FEM Maxwell equations for hi-freq circuit (2009)
  10. dielFilterV3clx (Dziekonski) -- High-order vector finite element method in EM (2011)

Additional notes:

  1. Each problem is solved ten times. The reported error is the maximum among runs. The reported computer time is the average without outliers.
  2. Total time includes initialization (memory allocation + symbolic factorization), numeric factorization, and solve.
  3. Column NNZ (number of non-zeros) corresponds to Pattern Entries reported by the SuiteSparse Matrix Collection.
  4. Column Sym shows if symmetry information was provided to the solver. An asterisk means positive definite information was also given.
  5. oom (out-of-memory) indicates that the symbolic factorization was terminated due to insufficient memory.
  6. Bold values highlight significant results, such as large errors or high computation times.
  7. The hybrid memory mode is enabled for the pres-cylin matrix and cuDSS.

Results

Information about the tested matrices

Real-valued matrices:

MatrixNrowNNZSym
bwm20002,0007,996No
rdb50005,00029,600No
Goodwin_04017,922561,677No
fp7,548848,553No
xenon148,6001,181,120No
twotone120,7501,224,224No
Raj1263,7431,302,464No
boyd2466,3161,500,397Yes
Goodwin_07156,0211,797,934No
darcy003389,8742,101,242Yes
rma1046,8352,374,001No
helm2d03392,2572,741,935Yes
stomach213,3603,021,648No
oilpan73,7523,597,188Yes*
ASIC_680k682,8623,871,773No
tmt_unsym917,8254,584,801No
Goodwin_127178,4375,778,545No
pre2659,0335,959,282No
marine1400,3206,226,538No
torso1116,1588,516,500No
atmosmodd1,270,4328,814,880No
atmosmodl1,489,75210,319,760No
memchip2,707,52414,810,202No
Freescale13,428,75518,920,347No
rajat314,690,00220,316,253No
Transport1,602,11123,500,731No
inline_1503,71236,816,342Yes*
PFlow_742742,79337,138,461Yes*
Emilia_923923,13641,005,206Yes*
dielFilterV21,157,45648,538,952Yes
Flan_15651,564,794117,406,044Yes*
pres-cylin1,711,464133,562,188Yes*

Complex-valued matrices:

MatrixNrowNNZSym
mhd1280b1,28022,778No
mplate5,962142,190Yes
RFdevice74,104365,580No
vfem93,4761,434,636No
fem_filter74,0621,731,206No
Chevron4711,4506,376,412No
mono_500Hz169,4105,036,288No
kim2456,97611,330,020No
fem_hifreq_circuit491,10020,239,237No
dielFilterV3clx420,40832,886,208Yes

Calculations on Arch. cuDSS and MUMPS. Real-valued matrices.

MatrixcuDSS MAcuDSS TimecuDSS ErrorMUMPS TimeMUMPS Error
bwm2000Auto8.633ms8.21e-166.260ms7.50e-16
rdb5000Auto30.087ms4.06e-137.442ms5.37e-13
Goodwin_040MaxMinDiag71.095ms2.82e-12106.490ms1.65e-11
fpAuto155.376ms4.98e-9164.943ms1.16e-20
xenon1Auto194.980ms1.32e-39308.937ms6.74e-40
twotoneAuto772.871ms4.73e-13879.251ms1.93e-13
Raj1Auto883.834ms1.21e-11901.208ms3.63e-13
boyd2None996.540ms3.24e-11755.545ms8.98e-11
Goodwin_071MaxMinDiag174.813ms8.65e-12258.507ms8.38e-11
darcy003MaxMinDiagAlt2.438s5.64e-710.667s1.75e-10
rma10Auto241.711ms9.04e-16150.619ms1.35e-16
helm2d03None1.239s1.68e-101.440s3.82e-10
stomachAuto1.280s2.71e-151.579s1.49e-15
oilpanNone102.687ms4.65e-15185.724ms2.22e-15
ASIC_680kAuto3.458s7.33e-111m17.353s8.09e-11
tmt_unsymAuto2.569s4.95e-73.061s1.43e-7
Goodwin_127MaxMinDiag372.312ms6.83e-11846.998ms6.20e-10
pre2Auto3.428s2.15e-144.383s2.60e-14
marine1Auto4.032s1.70e-85.352s6.25e-14
torso1MaxDiagCount1.050s5.90e-71.847s3.75e-6
atmosmoddAuto18.668s1.44e-1642.718s2.48e-16
atmosmodlAuto18.165s9.34e-1838.733s7.60e-18
memchipAuto6.941s3.23e-157.546s3.40e-15
Freescale1Auto8.076s9.39e-109.580s7.05e-10
rajat31Auto17.285s3.26e-1415.175s8.73e-15
TransportAuto21.417s7.21e-1041.188s3.81e-10
inline_1None1.762s5.86e-153.261s3.38e-15
PFlow_742None9.792s1.77e-1010.706s4.17e-11
Emilia_923None15.903s1.50e-2237.580s5.27e-23
dielFilterV2None7.007s1.38e-1112.388s1.49e-11
Flan_1565None10.212s2.24e-1519.735s9.98e-16
pres-cylinNone2m4.270s8.89e-131m18.368s5.09e-13

Calculations on Arch. cuDSS and MUMPS. Complex-valued matrices.

MatrixcuDSS MAcuDSS TimecuDSS ErrorMUMPS TimeMUMPS Error
mhd1280bAuto7.281ms1.57e-154.551ms1.17e-15
mplateNone78.587ms1.13e-10167.040ms6.28e-11
RFdeviceMaxDiagSum570.478ms3.21e-21.548s1.56e-7
vfemAuto865.606ms5.29e-81.403s8.74e-8
fem_filterAuto713.189ms1.81e-101.039s1.44e-10
Chevron4Auto2.480s8.51e-113.139s4.16e-11
mono_500HzAuto2.735s2.23e-104.830s5.99e-9
kim2Auto13.916s0.00e04.346s2.73e-18
fem_hifreq_circuitAuto4.871s1.60e-109.018s1.28e-10
dielFilterV3clxNone2.184s4.31e-103.969s1.73e-11

Calculations on Arch. UMFPACK and MUMPS. Real-valued matrices.

MatrixUMFPACK TimeUMFPACK ErrorMUMPS TimeMUMPS Error
bwm2000713.627µs5.30e-166.260ms7.50e-16
rdb500011.504ms1.93e-157.442ms5.37e-13
Goodwin_040104.256ms6.62e-13106.490ms1.65e-11
fp149.956ms2.57e-21164.943ms1.16e-20
xenon1582.982ms5.55e-40308.937ms6.74e-40
twotone334.710ms1.13e-14879.251ms1.93e-13
Raj1oomoom901.208ms3.63e-13
boyd21m19.604s2.86e-13755.545ms8.98e-11
Goodwin_071418.565ms2.77e-12258.507ms8.38e-11
darcy0031.072s6.95e-1210.667s1.75e-10
rma10398.716ms1.57e-17150.619ms1.35e-16
helm2d031.375s2.37e-131.440s3.82e-10
stomach1.681s5.13e-161.579s1.49e-15
oilpan481.117ms1.26e-15185.724ms2.22e-15
ASIC_680k1.242s2.20e-111m17.353s8.09e-11
tmt_unsym3.655s5.96e-83.061s1.43e-7
Goodwin_1271.380s6.03e-12846.998ms6.20e-10
pre2oomoom4.383s2.60e-14
marine1oomoom5.352s6.25e-14
torso11.314s3.82e-81.847s3.75e-6
atmosmoddoomoom42.718s2.48e-16
atmosmodloomoom38.733s7.60e-18
memchip28.947s2.04e-157.546s3.40e-15
Freescale128.693s7.05e-109.580s7.05e-10
rajat31oomoom15.175s8.73e-15
Transportoomoom41.188s3.81e-10
inline_1oomoom3.261s3.38e-15
PFlow_742oomoom10.706s4.17e-11
Emilia_923oomoom37.580s5.27e-23
dielFilterV2oomoom12.388s1.49e-11
Flan_1565oomoom19.735s9.98e-16
pres-cylinoomoom1m18.368s5.09e-13

Calculations on Arch. UMFPACK and MUMPS. Complex-valued matrices.

MatrixUMFPACK TimeUMFPACK ErrorMUMPS TimeMUMPS Error
mhd1280b345.764µs7.41e-164.551ms1.17e-15
mplate412.395ms2.32e-11167.040ms6.28e-11
RFdevice3.901s9.16e-141.548s1.56e-7
vfem5.132s2.07e-81.403s8.74e-8
fem_filter2.366s6.32e-111.039s1.44e-10
Chevron44.528s1.45e-113.139s4.16e-11
mono_500Hzoomoom4.830s5.99e-9
kim2oomoom4.346s2.73e-18
fem_hifreq_circuitoomoom9.018s1.28e-10
dielFilterV3clxoomoom3.969s1.73e-11

System and libraries information

System information:

--- OS ---
NAME="Arch Linux"
KERNEL=7.0.9-arch2-1

--- GPU ---
GPU[0]: NVIDIA GeForce RTX 4090, 595.71.05, 24564 MiB

--- CPU ---
Architecture:        x86_64
CPU(s):              32
On-line CPU(s) list: 0-31
Model name:          13th Gen Intel(R) Core(TM) i9-13900KF
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           1
CPU(s) scaling MHz:  32%
CPU max MHz:         5800.0000
CPU min MHz:         800.0000
BogoMIPS:            5990.40
L1d cache:           896 KiB (24 instances)
L1i cache:           1.3 MiB (24 instances)
L2 cache:            32 MiB (12 instances)
L3 cache:            36 MiB (1 instance)
NUMA node0 CPU(s):   0-31
Vulnerability L1tf:  Not affected

--- Memory ---
MemTotal:       32616244 kB
MemFree:         5895504 kB
MemAvailable:   23187532 kB
SwapTotal:      36806824 kB

Libraries information:

cuDSS: 0.8.0.10 (CUDA 13)
MUMPS: 5.9.0
SuiteSparse: latest (from GitHub)