vle

June 24, 2016 ยท View on GitHub

  • VLE is a simple variable-length encoder/decoder (C99)(C++03)
  • VLE is simple. Format is 7-bit packing, MSB stream terminator.
  • VLE is streamable. Designed to encode and decode integers with low overhead.
  • VLE is embeddable. Header-only. No external deps. C and C++ APIs provided.
  • VLE is extendable. Signed/unsigned integers of any size (8/16/32/64/...) are supported.
  • VLE is zlib/libpng licensed.

Quick tutorial

  • You want to serialize an struct { uint16_t len; uint64_t buffer[6]; } to disk, network, etc...
  • You could just flush a 50-bytes stream, or you could flush a VLE stream instead.
  • This VLE stream will range from 7-bytes (best-case, 43-bytes saved) up to 63-bytes (worst-case, 13-bytes overhead).

Features

  • For any 8-bit sequence, the VLE stream will range from len up to 2*len bytes.
  • For any 16-bit sequence, the VLE stream will range from len up to 3*len bytes.
  • For any 32-bit sequence, the VLE stream will range from len up to 5*len bytes.
  • For any 64-bit sequence, the VLE stream will range from len up to 10*len bytes.
  • Magnitude of value determinates size of encoded stream. Magnitude of type does not matter.
    • All byte(0), short(0), int(0), int64(0)... serialize to a 1-byte stream.
    • All byte(127), short(127), int(127), int64(127)... serialize to a 1-byte stream.
    • All byte(128), short(128), int(128), int64(128)... serialize to a 2-bytes stream.
    • All byte(255), short(255), int(255), int64(255)... serialize to a 2-bytes stream.
    • All short(16383), int(16383), int64(16383)... serialize to a 2-bytes stream.
    • All short(16384), int(16384), int64(16384)... serialize to a 3-bytes stream.
    • And so on... (see range tables below).
  • Rule: The closer to zero integer you encode, the smaller the stream you decode.
    • Note: Negative integers are rearranged to meet this criteria (see appendix).

VLE stream format

  • All non-significative (zero) bits on the left are discarded.
  • Bytes are repacked into 7-bit components and glued together until a LSB/MSB bit is found.
  • Additionally, signed integers are pre-encoded and post-decoded to a more efficient signed number representation.

API, C

  • Basic API. Allows streaming and fine control.
  • Encoders do not append null character at end of string.
  • Decoders do not need null character at end of string.
  • All functions assume buffers are preallocated to worst-case scenarios.
  • All functions return integer of streamed bytes.
 VLE_API uint64_t vle_encode_u(  uint8_t *buffer, uint64_t value );
 VLE_API uint64_t vle_encode_i(  uint8_t *buffer,  int64_t value );
 VLE_API uint64_t vle_decode_u( uint64_t *value, const uint8_t *buffer );
 VLE_API uint64_t vle_decode_i(  int64_t *value, const uint8_t *buffer );

API, C++

vlei:: {
  string   encode( int64_t );
  int64_t  decode( string );
}
vleu:: {
  string   encode( uint64_t );
  uint64_t decode( string );
}

VLE unsigned, 64-bit ranges

 1 byte  [0..127]
 2 bytes [128..16383]
 3 bytes [16384..2097151]
 4 bytes [2097152..268435455]
 5 bytes [268435456..34359738367]
 6 bytes [34359738368..4398046511103]
 7 bytes [4398046511104..562949953421311]
 8 bytes [562949953421312..72057594037927935]
 9 bytes [72057594037927936..9223372036854775807]
10 bytes [9223372036854775808..18446744073709551615]

VLE signed, 64-bit ranges

10 bytes [-9223372036854775808..-4611686018427387905]
 9 bytes [-4611686018427387904..-36028797018963969]
 8 bytes [-36028797018963968..-281474976710657]
 7 bytes [-281474976710656..-2199023255553]
 6 bytes [-2199023255552..-17179869185]
 5 bytes [-17179869184..-134217729]
 4 bytes [-134217728..-1048577]
 3 bytes [-1048576..-8193]
 2 bytes [-8192..-65]
 1 byte  [-64..63]
 2 bytes [64..8191]
 3 bytes [8192..1048575]
 4 bytes [1048576..134217727]
 5 bytes [134217728..17179869183]
 6 bytes [17179869184..2199023255551]
 7 bytes [2199023255552..281474976710655]
 8 bytes [281474976710656..36028797018963967]
 9 bytes [36028797018963968..4611686018427387903]
10 bytes [4611686018427387904..9223372036854775807]

used in

  • collage, a diff/patch library.
  • bundle, a de/compression library.

appendix: sign theory and conversion functions

##Signed-MagnitudeComplement-2VLEi (*)
+70 1110 111111 0
+60 1100 110110 0
+50 1010 101101 0
+40 1000 100100 0
+30 0110 011011 0
+20 0100 010010 0
+10 0010 001001 0
+00 0000 000000 0
-01 000----------
-11 0011 111000 1
-21 0101 110001 1
-31 0111 101010 1
-41 1001 100011 1
-51 1011 011100 1
-61 1101 010101 1
-71 1111 001110 1
-8----1 000111 1

(*): compared to C2 (default signed integer), bits are shared or inverted

c2tovlei(n) {
    return n & MSB ? ~(n<<1) : (n<<1);
}
vleitoc2(n) {
    return n & LSB ? ~(n>>1) : (n>>1);
}