FastPFor for Rust

March 23, 2026 · View on GitHub

Fast integer compression for Rust — both a pure-Rust implementation and a wrapper around the C++ FastPFor library. Supports 32-bit (and for some codecs 64-bit) integers. Based on the Decoding billions of integers per second through vectorization, 2012 paper.

The Rust decoder is about 29% faster than the C++ version. The Rust implementation contains no unsafe code, and when built without the cpp feature this crate has #![forbid(unsafe_code)].

Usage

Rust Implementation (default)

The simplest way is FastPFor256 — a composite codec that handles any input length by compressing aligned 256-element blocks with FastPForBlock256 and encoding any leftover values with VariableByte.

use fastpfor::{AnyLenCodec, FastPFor256};

let mut codec = FastPFor256::default();
let input: Vec<u32> = (0..1000).collect();

let mut encoded = Vec::new();
codec.encode(&input, &mut encoded).unwrap();

let mut decoded = Vec::new();
codec.decode(&encoded, &mut decoded, None).unwrap();

assert_eq!(decoded, input);

For block-aligned inputs you can use the lower-level BlockCodec API:

use fastpfor::{BlockCodec, FastPForBlock256, slice_to_blocks};

let mut codec = FastPForBlock256::default();
let input: Vec<u32> = (0..512).collect();   // exactly 2 blocks of 256

let (blocks, remainder) = slice_to_blocks::<FastPForBlock256>(&input);
assert_eq!(blocks.len(), 2);
assert!(remainder.is_empty());

let mut encoded = Vec::new();
codec.encode_blocks(blocks, &mut encoded).unwrap();

let mut decoded = Vec::new();
codec.decode_blocks(&encoded, Some(u32::try_from(blocks.len() * 256).expect("block count fits in u32")), &mut decoded).unwrap();

assert_eq!(decoded, input);

C++ Wrapper (`cpp` feature)

Enable the cpp feature in Cargo.toml:

fastpfor = { version = "0.1", features = ["cpp"] }

All C++ codecs implement the same AnyLenCodec trait (encode / decode), so the usage pattern is identical to the Rust examples above — just swap the codec type, e.g. cpp::CppFastPFor128::new().

Thread safety: C++ codec instances have internal state and are not thread-safe. Create one instance per thread or synchronize access externally.

Crate Features

Feature	Default	Description
`rust`	yes	Pure-Rust implementation — no `unsafe`, no build dependencies
`cpp`	no	C++ wrapper via CXX — requires a C++14 compiler with SIMD support
`cpp_portable`	no	Enables `cpp`, compiles C++ with SSE4.2 baseline (runs on any x86-64 from ~2008+)
`cpp_native`	no	Enables `cpp`, compiles C++ with `-march=native` for maximum throughput on the build machine

The FASTPFOR_SIMD_MODE environment variable (portable or native) can override the SIMD mode at build time.

Recommendation: Use cpp_portable (not cpp_native) for distributable binaries.

Supported Algorithms

Rust (`rust` feature)

Rust block codecs require block-aligned input. CompositeCodec chains a block codec with a tail codec (e.g. VariableByte) to handle arbitrary-length input. FastPFor256 and FastPFor128 are type aliases for such composites.

Codec	Description
`FastPFor256`	`CompositeCodec` of `FastPForBlock256` + `VariableByte`
`FastPFor128`	`CompositeCodec` of `FastPForBlock128` + `VariableByte`
`VariableByte`	Variable-byte encoding, MSB is opposite to protobuf's varint
`JustCopy`	No compression; useful as a baseline
`FastPForBlock256`	`FastPFor` with 256-element blocks; block-aligned input only
`FastPForBlock128`	`FastPFor` with 128-element blocks; block-aligned input only

C++ (`cpp` feature)

All C++ codecs are composite (any-length) and implement AnyLenCodec only. u64-capable codecs (CppFastPFor128, CppFastPFor256, CppVarInt) also implement BlockCodec64 with encode64 / decode64.

Codec	Notes
`CppFastPFor128`	`FastPFor + VByte` composite, 128-element blocks. Also supports `u64`.
`CppFastPFor256`	`FastPFor + VByte` composite, 256-element blocks. Also supports `u64`.
`CppSimdFastPFor128`	SIMD-optimized 128-element variant
`CppSimdFastPFor256`	SIMD-optimized 256-element variant
`CppBP32`	Binary packing, 32-bit blocks
`CppFastBinaryPacking8`	Binary packing, 8-bit groups
`CppFastBinaryPacking16`	Binary packing, 16-bit groups
`CppFastBinaryPacking32`	Binary packing, 32-bit groups
`CppSimdBinaryPacking`	SIMD-optimized binary packing
`CppPFor`	Patched frame-of-reference
`CppSimplePFor`	Simplified `PFor` variant
`CppNewPFor`	`PFor` with improved exception handling
`CppOptPFor`	Optimized `PFor`
`CppPFor2008`	Reference implementation from original paper
`CppSimdPFor`	SIMD `PFor`
`CppSimdSimplePFor`	SIMD `SimplePFor`
`CppSimdNewPFor`	SIMD `NewPFor`
`CppSimdOptPFor`	SIMD `OptPFor`
`CppSimple16`	16 packing modes in 32-bit words
`CppSimple9`	9 packing modes
`CppSimple9Rle`	Simple9 with run-length encoding
`CppSimple8b`	8 packing modes in 64-bit words
`CppSimple8bRle`	Simple8b with run-length encoding
`CppSimdGroupSimple`	SIMD group-simple encoding
`CppSimdGroupSimpleRingBuf`	SIMD group-simple with ring buffer
`CppVByte`	Standard variable-byte encoding
`CppMaskedVByte`	SIMD masked variable-byte
`CppStreamVByte`	SIMD stream variable-byte
`CppVarInt`	Standard varint. Also supports `u64`.
`CppVarIntGb`	Group varint
`CppCopy`	No compression (baseline)

Benchmarks

Decoding

Using Linux x86-64 running just bench::cpp-vs-rust-decode native. The values below are time measurements; smaller values indicate faster decoding.

name	cpp (ns)	rust (ns)	% faster
`clustered/1024`	643.24	392.93	38.91%
`clustered/4096`	1986	1414.8	28.76%
`sequential/1024`	653.69	396.02	39.42%
`sequential/4096`	2106	1476.2	29.91%
`sparse/1024`	428.8	352.38	17.82%
`sparse/4096`	1114	1179.5	-5.88%
`uniform_large_value_distribution/1024`	286.74	153.06	46.62%
`uniform_large_value_distribution/4096`	748.19	558.05	25.41%
`uniform_small_value_distribution/1024`	606.4	405.44	33.14%
`uniform_small_value_distribution/4096`	2017.3	1403.7	30.42%

Rust encoding has not yet been fully optimized or verified.

Build Requirements

Rust feature (rust, the default): no additional dependencies.
C++ feature (cpp): requires a C++14-capable compiler with SIMD intrinsics. See FastPFor C++ requirements.

Linux

The default GitHub Actions runner has all needed dependencies.

For local development:

# This list may be incomplete
sudo apt-get install build-essential

libsimde-dev is optional. On ARM/aarch64, the C++ build fetches SIMDe via CMake and the CXX bridge reuses that include path automatically.

macOS

On Apple Silicon, SIMDe installation is usually not required — the C++ build fetches it via CMake.

If you prefer a Homebrew fallback:

brew install simde
export CXXFLAGS="-I/opt/homebrew/include"
export CFLAGS="-I/opt/homebrew/include"

Development

This project uses just as a task runner:

cargo install just   # install once
just                 # list available commands
just test            # run all tests

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual-licensed as above, without any additional terms or conditions.