FastPFor for Rust

March 23, 2026 · View on GitHub

GitHub repo crates.io version crate usage docs.rs status crates.io license CI build status Codecov

Fast integer compression for Rust — both a pure-Rust implementation and a wrapper around the C++ FastPFor library. Supports 32-bit (and for some codecs 64-bit) integers. Based on the Decoding billions of integers per second through vectorization, 2012 paper.

The Rust decoder is about 29% faster than the C++ version. The Rust implementation contains no unsafe code, and when built without the cpp feature this crate has #![forbid(unsafe_code)].

Usage

Rust Implementation (default)

The simplest way is FastPFor256 — a composite codec that handles any input length by compressing aligned 256-element blocks with FastPForBlock256 and encoding any leftover values with VariableByte.

use fastpfor::{AnyLenCodec, FastPFor256};

let mut codec = FastPFor256::default();
let input: Vec<u32> = (0..1000).collect();

let mut encoded = Vec::new();
codec.encode(&input, &mut encoded).unwrap();

let mut decoded = Vec::new();
codec.decode(&encoded, &mut decoded, None).unwrap();

assert_eq!(decoded, input);

For block-aligned inputs you can use the lower-level BlockCodec API:

use fastpfor::{BlockCodec, FastPForBlock256, slice_to_blocks};

let mut codec = FastPForBlock256::default();
let input: Vec<u32> = (0..512).collect();   // exactly 2 blocks of 256

let (blocks, remainder) = slice_to_blocks::<FastPForBlock256>(&input);
assert_eq!(blocks.len(), 2);
assert!(remainder.is_empty());

let mut encoded = Vec::new();
codec.encode_blocks(blocks, &mut encoded).unwrap();

let mut decoded = Vec::new();
codec.decode_blocks(&encoded, Some(u32::try_from(blocks.len() * 256).expect("block count fits in u32")), &mut decoded).unwrap();

assert_eq!(decoded, input);

C++ Wrapper (cpp feature)

Enable the cpp feature in Cargo.toml:

fastpfor = { version = "0.1", features = ["cpp"] }

All C++ codecs implement the same AnyLenCodec trait (encode / decode), so the usage pattern is identical to the Rust examples above — just swap the codec type, e.g. cpp::CppFastPFor128::new().

Thread safety: C++ codec instances have internal state and are not thread-safe. Create one instance per thread or synchronize access externally.

Crate Features

FeatureDefaultDescription
rustyesPure-Rust implementation — no unsafe, no build dependencies
cppnoC++ wrapper via CXX — requires a C++14 compiler with SIMD support
cpp_portablenoEnables cpp, compiles C++ with SSE4.2 baseline (runs on any x86-64 from ~2008+)
cpp_nativenoEnables cpp, compiles C++ with -march=native for maximum throughput on the build machine

The FASTPFOR_SIMD_MODE environment variable (portable or native) can override the SIMD mode at build time.

Recommendation: Use cpp_portable (not cpp_native) for distributable binaries.

Supported Algorithms

Rust (rust feature)

Rust block codecs require block-aligned input. CompositeCodec chains a block codec with a tail codec (e.g. VariableByte) to handle arbitrary-length input. FastPFor256 and FastPFor128 are type aliases for such composites.

CodecDescription
FastPFor256CompositeCodec of FastPForBlock256 + VariableByte
FastPFor128CompositeCodec of FastPForBlock128 + VariableByte
VariableByteVariable-byte encoding, MSB is opposite to protobuf's varint
JustCopyNo compression; useful as a baseline
FastPForBlock256FastPFor with 256-element blocks; block-aligned input only
FastPForBlock128FastPFor with 128-element blocks; block-aligned input only

C++ (cpp feature)

All C++ codecs are composite (any-length) and implement AnyLenCodec only. u64-capable codecs (CppFastPFor128, CppFastPFor256, CppVarInt) also implement BlockCodec64 with encode64 / decode64.

CodecNotes
CppFastPFor128FastPFor + VByte composite, 128-element blocks. Also supports u64.
CppFastPFor256FastPFor + VByte composite, 256-element blocks. Also supports u64.
CppSimdFastPFor128SIMD-optimized 128-element variant
CppSimdFastPFor256SIMD-optimized 256-element variant
CppBP32Binary packing, 32-bit blocks
CppFastBinaryPacking8Binary packing, 8-bit groups
CppFastBinaryPacking16Binary packing, 16-bit groups
CppFastBinaryPacking32Binary packing, 32-bit groups
CppSimdBinaryPackingSIMD-optimized binary packing
CppPForPatched frame-of-reference
CppSimplePForSimplified PFor variant
CppNewPForPFor with improved exception handling
CppOptPForOptimized PFor
CppPFor2008Reference implementation from original paper
CppSimdPForSIMD PFor
CppSimdSimplePForSIMD SimplePFor
CppSimdNewPForSIMD NewPFor
CppSimdOptPForSIMD OptPFor
CppSimple1616 packing modes in 32-bit words
CppSimple99 packing modes
CppSimple9RleSimple9 with run-length encoding
CppSimple8b8 packing modes in 64-bit words
CppSimple8bRleSimple8b with run-length encoding
CppSimdGroupSimpleSIMD group-simple encoding
CppSimdGroupSimpleRingBufSIMD group-simple with ring buffer
CppVByteStandard variable-byte encoding
CppMaskedVByteSIMD masked variable-byte
CppStreamVByteSIMD stream variable-byte
CppVarIntStandard varint. Also supports u64.
CppVarIntGbGroup varint
CppCopyNo compression (baseline)

Benchmarks

Decoding

Using Linux x86-64 running just bench::cpp-vs-rust-decode native. The values below are time measurements; smaller values indicate faster decoding.

namecpp (ns)rust (ns)% faster
clustered/1024643.24392.9338.91%
clustered/409619861414.828.76%
sequential/1024653.69396.0239.42%
sequential/409621061476.229.91%
sparse/1024428.8352.3817.82%
sparse/409611141179.5-5.88%
uniform_large_value_distribution/1024286.74153.0646.62%
uniform_large_value_distribution/4096748.19558.0525.41%
uniform_small_value_distribution/1024606.4405.4433.14%
uniform_small_value_distribution/40962017.31403.730.42%

Rust encoding has not yet been fully optimized or verified.

Build Requirements

  • Rust feature (rust, the default): no additional dependencies.
  • C++ feature (cpp): requires a C++14-capable compiler with SIMD intrinsics. See FastPFor C++ requirements.

Linux

The default GitHub Actions runner has all needed dependencies.

For local development:

# This list may be incomplete
sudo apt-get install build-essential

libsimde-dev is optional. On ARM/aarch64, the C++ build fetches SIMDe via CMake and the CXX bridge reuses that include path automatically.

macOS

On Apple Silicon, SIMDe installation is usually not required — the C++ build fetches it via CMake.

If you prefer a Homebrew fallback:

brew install simde
export CXXFLAGS="-I/opt/homebrew/include"
export CFLAGS="-I/opt/homebrew/include"

Development

This project uses just as a task runner:

cargo install just   # install once
just                 # list available commands
just test            # run all tests

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual-licensed as above, without any additional terms or conditions.