Universally Unique Lexicographically Sortable Identifier in Nim

April 28, 2020 · View on GitHub



ulid


Universally Unique Lexicographically Sortable Identifier in Nim

UUID can be suboptimal for many uses-cases because:

  • It isn't the most character efficient way of encoding 128 bits of randomness
  • UUID v1/v2 is impractical in many environments, as it requires access to a unique, stable MAC address
  • UUID v3/v5 requires a unique seed and produces randomly distributed IDs, which can cause fragmentation in many data structures
  • UUID v4 provides no other information than randomness which can cause fragmentation in many data structures

Instead, herein is proposed ULID:

  • 128-bit compatibility with UUID
  • 1.21e+24 unique ULIDs per millisecond
  • Lexicographically sortable!
  • Canonically encoded as a 26 character string, as opposed to the 36 character UUID
  • Uses Crockford's base32 for better efficiency and readability (5 bits per character)
  • Case insensitive
  • No special characters (URL safe)

Nim

Installation

nimble install ulid

Usage

import ulid

ulid()

Implementations in other languages

From the community!

LanguageAuthorBinary Implementation
C++suyash
Objective-Cricardopereira
CrystalSuperPaintman
Dartisoos
Delphimatinusso
Erlangsavonarola
Elixirmerongivian
Goimdario
Gooklog
Haskellsteven777400
Javahuxi
Javaazam
JavaLewiscowles1986
Juliaararslan
LuaTieske
.NETRobThree
.NETfvilers
Nimadelq
Perl 5bk
PHPLewiscowles1986
Pythonmdipierro
Rubyrafaelsales
Rustmmacedoeu
Rustdylanhart
Swiftsimonwhitehouse
Tcldbohdan

Specification

Below is the current specification of ULID as implemented in this repository.

Note: the binary format has not been implemented.

 01AN4Z07BY      79KA1307SR9X4MV3

|----------|    |----------------|
 Timestamp          Randomness
   48bits             80bits

Components

Timestamp

  • 48 bit integer
  • UNIX-time in milliseconds
  • Won't run out of space till the year 10895 AD.

Randomness

  • 80 bits
  • Cryptographically secure source of randomness, if possible

Sorting

The left-most character must be sorted first, and the right-most character sorted last (lexical order). The default ASCII character set must be used. Within the same millisecond, sort order is not guaranteed

Encoding

Crockford's Base32 is used as shown. This alphabet excludes the letters I, L, O, and U to avoid confusion and abuse.

0123456789ABCDEFGHJKMNPQRSTVWXYZ

Binary Layout and Byte Order

The components are encoded as 16 octets. Each component is encoded with the Most Significant Byte first (network byte order).

0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+
|                      32_bit_uint_time_high                    |
+-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+
|     16_bit_uint_time_low      |       16_bit_uint_random      |
+-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+
|                       32_bit_uint_random                      |
+-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+
|                       32_bit_uint_random                      |
+-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-+

String Representation

ttttttttttrrrrrrrrrrrrrrrr

where
t is Timestamp
r is Randomness

Prior Art

Inspired by:

Partly inspired by:

Test Suite

nimble test

Performance

nimble perf
ulid.nim                                                  time/iter  iters/s
============================================================================
encode_time                                                319.93ns    3.13M
encode_random                                                3.57us  280.33K
ulid                                                         3.85us  259.78K