Low Precision Fortran (LPF)
June 15, 2026 · View on GitHub
Low Precision Float types in Fortran.
Version: 0.1.0
DOI: 10.5281/zenodo.20442279
Copyright 2026 by Martin Köhler
Usage
Since Fortran does not provide and interface to low precision float types, like fp16 or bfloat16. This library implements both on top of an int16 datatype, such that they become interoperable between C and Fortran via a reinterpretation of the pointers. Furthermore, it uses Fortran 2008 features to provide an easy to use interface on the Fortran side. By defining a type
use lpf_fp16
type(FP16) :: x
and using it like any other REAL variable, fp16 can be used in Fortran.
Analogously, the bf16 support can be used via
use lpf_bf16
type(BF16) :: y
Support for FP8 is provided via E4M3 and E5M2 formats:
use lpf_fp8_e4m3
type(FP8_E4M3) :: x_e4m3
and
use lpf_fp8_e5m2
type(FP8_E5M2) :: y_e5m2
Both work like any other numerical type, but with one difference: If they are
involved in an computation with INTEGER or REAL(real32) datatypes, the
whole computation is performed in low precision.
Since Fortran does not support user-defined type literals, the low precision type are generated by assignment or
x = 1.0E0
x = FP16(1.0D0)
y = BF16(1.0D0)
x_e4m3 = FP8_E4M3(1.0D0)
y_e5m2 = FP8_E5M2(1.0D0)
Converting back is possible using assignment or the type-cast function real or
dble:
type(fp16) :: x
real :: xr
double precision :: xd
xr = real(x)
xd = dble(x)
An implicit upcast to real or double precision is not allowed. This must be
done manually by call real or dble. The kind parameter in the real call
is not supported, since this must be a compile-time constant, which could not be
evaluated by an overloaded function.
[!caution] Precision Leak In order to avoid precision leaks during the usage of Low Precision Fortran, there is a difference to other way low precision aroithmetic is implemented in other programming languages:
Every operation, where at least one operand is of a low precision type, is performed in the low precision.
This is necessary to get expressions like x + 1.0 working without storing the
1.0 in a variable before. If you want this operation to be performed in
real32, you have to use real(x) + 1.0.
Input / Output Operations
The datatypes only support the formatted output with WRITE and PRINT
statements. If used in FORMAT statements, the DT(W,P) identifier can be
used, where W indicates the width of the overall number and P the precision.
Alternatively, the scientific notation can be used by DT"E"(W,P). In pratice,
this looks like:
type(fp16) :: x
x = 3.14
write(*, '(DT"E"(10,4))') x
Math-Function
The low precision datatypes support all standard Fortran Math functions.
- Core Arithmetic: Support for
+,-,*,/, and**. - Math Functions: Comprehensive support for single-argument functions (
abs,sin,cos,exp,log,sqrt, etc.), two-argument functions (atan2,hypot), and specialized trigonometric functions. - Array Reductions:
maxval,minval,maxloc,minlocnorm2,'matmuldot_product,sum,product
For fp16 and bf16, the basic math functions are implemented using
their `REAL(real32) counterpart with typecasts and the implementation from libm
instead of the ones from the Fortran compilers. FP8 operations are implemented
via emulation and lookup tables for efficiency.
ToDo
- Support for unformatted IO
- AXV512FP16 and emulation support at once, but dispatching causes an overhead.
Issue / Missing Functionallity
MAXLOC/MAXVAL/MINVAL/MINLOC
The family of maxloc, maxval, minval, and minloc only support up to 4D
arrays. The optional MASK argument in maxval and minval is not yet supported.
The maxloc and minloc function does not support the mask, kind, and
back argument, yet.
MIN/MAX
The min and max function involving fp16 or bf16 datatypes supports up to
three inputs at the moment.
NORM2
The norm2 interface allows up to rank 3 objects.
AMD AOCC
The AMD AOCC compiler does not support the overloading of the write interface
in a complete private environment and making the function public afterwards.
For this reason, we do not support the AMD compiler suite.
LLVM clang/flang
The flang-new compiler complations about ambigous interface nams. Regarging § 7.5.10/Note11 in the Fortran 2023 standard, this warning is a false positive and can be safely ignored.
Installation
The library currently works with gcc compilers supporting the _Float16
datatype and the -mf16c switch (starting with GCC 11). For proper hardware
support of fp16, a CPU with AVX512FP16 is required.
FP8 support is implemented via emulation and lookup tables, and therefore does not require specific hardware flags for basic functionality.
Building
The build process is controlled via the following CMake options:
| Option | Default | Description |
|---|---|---|
LPF_DEBUG | OFF | Enable Debug Symbol generation |
LPF_NATIVE | OFF | Enable Native Build |
LPF_BLAS | ON | Enable BLAS and LAPACK |
LPF_FP8_E4M3 | ON | Support for FP8_E4M3 |
LPF_FP8_E5M2 | ON | Support for FP8_E5M2 |
LPF_AVX512_FP16 | OFF | Enable AVX512-FP16 Instructions |
LPF_AVX512_BF16 | OFF | Enable AVX512-BF16 Instructions |
LPF_EXAMPLES_TLAPACK | OFF | Build TLAPACK examples |
LPF_TESTING | ON | Build the LPF example programs and tests |
CMAKE_BUILD_TYPE | Release | Choose the type of build (e.g., Debug, Release) |
ENABLE_COVERAGE | OFF | Enable Coverage Build |
cmake -S . -B build-dir -DCMAKE_INSTALL_PREFIX=INSTALL-LOCATION
cmake --build build-dir
cmake --build build-dir --target install
Native fp16/bf16 support
Some CPUs, like Intel Xeons out of the Sapphire rappids series or Apple Silicon chips have (partial) support in hardware. By specifying the proper C flags, these can be used:
x86-64
This requires the AVX512FP16 and/or the AXV512BF16 capability. This can be
checked using
lscpu | grep -o -E '(axv512_fp16|avx512_bf16)'
Depending on which features are available, the corresponding compiler switches can be added. Either by specifying the C flags
-DCMAKE_C_FLAGS="-mavx512fp16 -mavx512bf16"
or enabling the LPF_AVX512_FP16 and/or LPF_AVX512_BF16 option as listed
above.
AArch64 and Apple Silicon
In case of an fp16 or bf16 aware AArch64 CPU, the compiler flags will be
-DCMAKE_C_FLAGS="-march=armv8.4-a+fp16+bf16"
Alternatively, one can add
-DLPF_NATIVE=ON
to the cmake command line in order to try the cpu-native compilation, if
supported by the target.
Usage in other projects
The project can either be used as subproject or integrated using CMake's
FetchContent mechanism.
License
The library is licensed under GNU LGPL 3 or any later version if you want. See LICENSE for details.