Low Precision Fortran (LPF)

June 15, 2026 · View on GitHub

Low Precision Float types in Fortran.

Version: 0.1.0

DOI: 10.5281/zenodo.20442279

Usage

Since Fortran does not provide and interface to low precision float types, like fp16 or bfloat16. This library implements both on top of an int16 datatype, such that they become interoperable between C and Fortran via a reinterpretation of the pointers. Furthermore, it uses Fortran 2008 features to provide an easy to use interface on the Fortran side. By defining a type

 use lpf_fp16
 type(FP16) :: x

and using it like any other REAL variable, fp16 can be used in Fortran. Analogously, the bf16 support can be used via

 use lpf_bf16
 type(BF16) :: y

Support for FP8 is provided via E4M3 and E5M2 formats:

 use lpf_fp8_e4m3
 type(FP8_E4M3) :: x_e4m3

and

 use lpf_fp8_e5m2
 type(FP8_E5M2) :: y_e5m2

Both work like any other numerical type, but with one difference: If they are involved in an computation with INTEGER or REAL(real32) datatypes, the whole computation is performed in low precision.

Since Fortran does not support user-defined type literals, the low precision type are generated by assignment or

x = 1.0E0
x = FP16(1.0D0)
y = BF16(1.0D0)
x_e4m3 = FP8_E4M3(1.0D0)
y_e5m2 = FP8_E5M2(1.0D0)

Converting back is possible using assignment or the type-cast function real or dble:

type(fp16) :: x
real :: xr
double precision :: xd

xr = real(x)
xd = dble(x)

An implicit upcast to real or double precision is not allowed. This must be done manually by call real or dble. The kind parameter in the real call is not supported, since this must be a compile-time constant, which could not be evaluated by an overloaded function.

[!caution] Precision Leak In order to avoid precision leaks during the usage of Low Precision Fortran, there is a difference to other way low precision aroithmetic is implemented in other programming languages:

Every operation, where at least one operand is of a low precision type, is performed in the low precision.

This is necessary to get expressions like x + 1.0 working without storing the 1.0 in a variable before. If you want this operation to be performed in real32, you have to use real(x) + 1.0.

Input / Output Operations

The datatypes only support the formatted output with WRITE and PRINT statements. If used in FORMAT statements, the DT(W,P) identifier can be used, where W indicates the width of the overall number and P the precision. Alternatively, the scientific notation can be used by DT"E"(W,P). In pratice, this looks like:

type(fp16) :: x
x = 3.14
write(*, '(DT"E"(10,4))') x

Math-Function

The low precision datatypes support all standard Fortran Math functions.

Core Arithmetic: Support for +, -, *, /, and **.
Math Functions: Comprehensive support for single-argument functions (abs, sin, cos, exp, log, sqrt, etc.), two-argument functions (atan2, hypot), and specialized trigonometric functions.
Array Reductions:
- maxval, minval, maxloc, minloc
- norm2, 'matmul
- dot_product, sum, product

For fp16 and bf16, the basic math functions are implemented using their `REAL(real32) counterpart with typecasts and the implementation from libm instead of the ones from the Fortran compilers. FP8 operations are implemented via emulation and lookup tables for efficiency.

ToDo

Support for unformatted IO
AXV512FP16 and emulation support at once, but dispatching causes an overhead.

Issue / Missing Functionallity

MAXLOC/MAXVAL/MINVAL/MINLOC

The family of maxloc, maxval, minval, and minloc only support up to 4D arrays. The optional MASK argument in maxval and minval is not yet supported. The maxloc and minloc function does not support the mask, kind, and back argument, yet.

MIN/MAX

The min and max function involving fp16 or bf16 datatypes supports up to three inputs at the moment.

NORM2

The norm2 interface allows up to rank 3 objects.

AMD AOCC

The AMD AOCC compiler does not support the overloading of the write interface in a complete private environment and making the function public afterwards. For this reason, we do not support the AMD compiler suite.

LLVM clang/flang

The flang-new compiler complations about ambigous interface nams. Regarging § 7.5.10/Note11 in the Fortran 2023 standard, this warning is a false positive and can be safely ignored.

Installation

The library currently works with gcc compilers supporting the _Float16 datatype and the -mf16c switch (starting with GCC 11). For proper hardware support of fp16, a CPU with AVX512FP16 is required.

FP8 support is implemented via emulation and lookup tables, and therefore does not require specific hardware flags for basic functionality.

Building

The build process is controlled via the following CMake options:

Option	Default	Description
`LPF_DEBUG`	`OFF`	Enable Debug Symbol generation
`LPF_NATIVE`	`OFF`	Enable Native Build
`LPF_BLAS`	`ON`	Enable BLAS and LAPACK
`LPF_FP8_E4M3`	`ON`	Support for FP8_E4M3
`LPF_FP8_E5M2`	`ON`	Support for FP8_E5M2
`LPF_AVX512_FP16`	`OFF`	Enable AVX512-FP16 Instructions
`LPF_AVX512_BF16`	`OFF`	Enable AVX512-BF16 Instructions
`LPF_EXAMPLES_TLAPACK`	`OFF`	Build TLAPACK examples
`LPF_TESTING`	`ON`	Build the LPF example programs and tests
`CMAKE_BUILD_TYPE`	`Release`	Choose the type of build (e.g., Debug, Release)
`ENABLE_COVERAGE`	`OFF`	Enable Coverage Build

cmake -S . -B build-dir -DCMAKE_INSTALL_PREFIX=INSTALL-LOCATION
cmake --build build-dir
cmake --build build-dir --target install

Native fp16/bf16 support

Some CPUs, like Intel Xeons out of the Sapphire rappids series or Apple Silicon chips have (partial) support in hardware. By specifying the proper C flags, these can be used:

x86-64

This requires the AVX512FP16 and/or the AXV512BF16 capability. This can be checked using

lscpu | grep -o -E '(axv512_fp16|avx512_bf16)'

Depending on which features are available, the corresponding compiler switches can be added. Either by specifying the C flags

-DCMAKE_C_FLAGS="-mavx512fp16 -mavx512bf16"

or enabling the LPF_AVX512_FP16 and/or LPF_AVX512_BF16 option as listed above.

AArch64 and Apple Silicon

In case of an fp16 or bf16 aware AArch64 CPU, the compiler flags will be

-DCMAKE_C_FLAGS="-march=armv8.4-a+fp16+bf16"

Alternatively, one can add

-DLPF_NATIVE=ON

to the cmake command line in order to try the cpu-native compilation, if supported by the target.

Usage in other projects

The project can either be used as subproject or integrated using CMake's FetchContent mechanism.

License

The library is licensed under GNU LGPL 3 or any later version if you want. See LICENSE for details.