README.txt

June 18, 2020 · View on GitHub

muxzcat: tiny and portable .xz and .lzma decompression filter ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ muxzcat is decompression filter for .xz and .lzma compressed files implemented in C (also works in C++), Perl 5 and Java. muxzcat.c is portable, platform-independent and backwards-compatible with old (pre-2000) versions of Perl, Java, C, Linux and Windows. The C version can be compiled statically linked, self-contained and size-optimized for Linux i386: it uses only system calls: read(2), write(2), _exit(2), mmap(2) and mremap(2), and it doesn't use the standard C library (libc).

The following binaries are released on https://github.com/pts/muxzcat/releases :

  • muxzcat.xtiny: tiny Linux i386 executable (also runs on Linux amd64)
  • muxzcat.upx: tiny Linux i386 executable, compressed with upxbc
  • muxzcat.exe: Windows i386 Win32 executable using kernel32.dll only
  • muxzcat.darwinc32: macOS i386 executable
  • muxzcat.darwinc64: macOS amd64 (x86_64) executable
  • muxzcat.pl: Perl script which works on Perl 5.004_04 or later
  • muxzcat.class: Java command-line application which works on Java 1.0.2 or later
  • muxzcatj12.jar: Java-with-lib command-line application which works on Java 1.2 or later

muxzcat.c is size-optimized for Linux i386 (also runs on Linux amd64) with xtiny gcc': the final statically linked executable is 8136 bytes, and with upxbc (upxbc --elftiny -f -o muxzcat.upx muxzcat') it can be compressed to 5123 bytes. (Compare it with xzcat-only busybox on Linux i386, which is >20 KiB.)

muxzcat.c is size-optimized for Windows i386 (also runs on Windows amd64) with gcc-mingw32 and some command-line flags (see muxzcat.exe in the Makefile): the final muxzcat.exe is 10752 bytes. (Compare it with xzdec.exe in https://fossies.org/windows/misc/xz-5.2.4-windows.zip/ , which is >70 KiB.)

To use the C implementation (muxzcat.c), either download the binary executable from https://github.com/pts/muxzcat/releases or compile it from source (see the beginning of muxzcat.c for compilation instructions), and then run it with any of:

./muxzcat<input.xz>output.bin./muxzcat <input.xz >output.bin ./muxzcat <input.lzma >output.bin

Error is indicated as a non-zero exit status.

It ignores command-line flags, so you can specify e.g. `-cd'.

Here is how to use the Perl implementation (muxzcat.pl):

perlmuxzcat.pl<input.xz>output.binperl muxzcat.pl <input.xz >output.bin perl muxzcat.pl <input.lzma >output.bin

Error is indicated as a non-zero exit status.

It ignores command-line flags, so you can specify e.g. `-cd'.

Here is how to use the Java implementation (muxzcat.java). After compilation or downloading of muxzcat.class, run with any of:

javamuxzcat<input.xz>output.binjava muxzcat <input.xz >output.bin java muxzcat <input.lzma >output.bin

Error is indicated as a non-zero exit status.

It ignores command-line flags, so you can specify e.g. `-cd'.

Here is how to use the Java-with-lib implementation:

$ java -jar muxzcatj12.jar <input.xz >output.bin

Error is indicated as a non-zero exit status and an exception report on stderr.

It ignores command-line flags, so you can specify e.g. `-cd'.

muxzcat is a drop-in replacement for the following commands:

xzcd<input.xz>output.binxz -cd <input.xz >output.bin unxz -cd <input.xz >output.bin xzcatcd<input.xz>output.binxzcat -cd <input.xz >output.bin xzdec -cd <input.xz >output.bin busyboxxzcd<input.xz>output.binbusybox xz -cd <input.xz >output.bin busybox unxz -cd <input.xz >output.bin busyboxxzcatcd<input.xz>output.binbusybox xzcat -cd <input.xz >output.bin xz -cd <input.lzma >output.bin unxzcd<input.lzma>output.binunxz -cd <input.lzma >output.bin xzcat -cd <input.lzma >output.bin lzmacd<input.lzma>output.binlzma -cd <input.lzma >output.bin unlzma -cd <input.lzma >output.bin lzmadeccd<input.lzma>output.binlzmadec -cd <input.lzma >output.bin busybox lzma -cd <input.lzma >output.bin busyboxunlzmacd<input.lzma>output.binbusybox unlzma -cd <input.lzma >output.bin busybox lzmadec -cd <input.lzma >output.bin

muxzcat is free software, GNU GPL >=2.0. There is NO WARRANTY. Use at your risk.

Limitations of muxzcat.c, muxzcat.pl and muxzcat.java:

  • In worst case it keeps 2 times the compression dictionary size in dynamic memory (also multiply it by 3 for realloc overhead), and it needs 130 KiB of memory on top of it: readBuf is about 64 KiB, CLzmaDec.prob is about 28 KiB, the rest is decompressBuf (containing the entire uncompressed data) and a small constant overhead.
  • It doesn't support dictionary sizes larger than 1610612736 (~1.61 GB). The default for xz (-6) is 8 MiB. (This is not a problem in practice, because even the ouput of `xz -9e' uses only 64 MiB dictionary size.)
  • It doesn't support uncompressed data larger than 1610612736 (~1.61 GB).
  • For .xz it supports only LZMA2 (no other filters such as BCJ).
  • For .lzma it doesn't work with files with 5 <= lc + lp <= 12.
  • It doesn't verify checksums (e.g. CRC-32 or CRC-64).
  • It extracts the first stream only, and it ignores the index.
  • muaxzcat.c, muxzcat.java and muxzcat.pl keep the entire uncompressed output data in memory, and they have a limit of 1610612736 (~1.61 GB). FYI linux-4.20.5.tar is about half as much, 854855680 bytes.
  • muxzcat.java doesn't work with Avian 0.6 (OutOfMemoryError).
  • muxzcat.java uses the maximum amount of memory (~1.61 GB) for some .lzma files which don't have their uncompressed size specified. This includes files created by `xz --format=lzma'.

Limitations of Java-with-lib muxzcatj12.jar:

  • It doesn't work with Avian 0.6 (it uses some classes not available there).
  • Its memory usage is constant + 100 KiB + dictionary size, so it doesn't keep the entire uncompressed data in memory.

muxzcat is portable because:

  • It works with old (pre-2000) C and C++ compilers.
  • It works on old (pre-2000) Linux systems: Linux 2.6 and later.
  • It works on old (pre-2000) Windows systems: Windows 95 and later.
  • It works on old macOS systems: Mac OS X 10.04 (released on 2005-04-29, the first release which supports Intel CPUs: i386 and amd64) and later.
  • It has good library compatibility by not using any libraries on Linux (not even libc.so.6) and using only 6 functions in kernel32.dll on Windows.
  • It works with old (pre-2000) versions of Java: the minimum is Java 1.0.2 (released on 1995-09-16).
  • It works with old (pre-2000) versions of Perl: the minimum is Perl 5.004_04 (released on 1997-10-15).

Which Java program to use: muxzcat.java or muxzcatj12.jar?

  • Most users should use muxzcatj12.jar, because it needs less memory, it is faster than muxzcat.java, and it also verifies checksums.
  • On very old systems where Java >=1.0.2 is available, but Java >=1.2 isn't, muxzcat.java should be used.
  • If maximum compatibility with muaxzcat.c (also muxzcat.c) and muzcat.pl is desired, then muxzcat.java should be used.

Based on decompression speed measurements of linux-4.20.5.tar.xz, size-optimized muxzcat.c (on Linux i386) is about 1.07 times slower than speed-optimized xzcat (on Linux amd64).

Based on decompression speed measurements of a ~2 MiB .tar.xz file and of the ~100 MiB linux-4.20.5.tar.xz, size-optimized muxzcat.c (on Linux i386) is about 285 times faster than muxzcat.pl (on perl compiled for Linux amd64). The C and Perl implementations are derived from the same codebase (muxzcat.c, itself derived from the sources files in 7z922.tar.bz2). Part of the slowness is because LZMA decompression needs 32-bit unsigned arithmetic, and perl compiled for Linux amd64 can do 64-bit signed arithmetic, so the inputs of some operators (e.g. >>, <, ==) need to be bit-masked to get correct results. (Fortunately % and / are not used in LZMA decompression, because they would be even slower when emulated on the wrong signedness.) muxzcat.pl is even slower (by a factor of 1.1017) than that on Perls which can do 32-bit signed aritmetic, because special handling is needed for negative inputs of <.

More information about decompression speed differences between C and Perl: https://ptspts.blogspot.com/2019/02/speed-of-in-memory-algorithms-in.html

muxzcat.pl is compatible with recent versions of Perl 5 (e.g. Perl 5.24) and very old versions of Perl 5 (e.g. Perl 5.004_04, released on 1997-10-15).

Based on decompression speed measurements of the ~100 MiB linux-4.20.5.tar.xz, size-optimized muxzcat.c (on Linux i386) is about 1.347 times faster than muxzcatj12.jar (with java 1.8 compiled for Linux amd64). The C and Java implementations are completely different, they don't share code.

muxzcatj12.jar needs Java 1.2 (released on 1998-12) or any more recent Java, e.g. Java 8. Versions earlier than 1.2 don't work, because they lack java.util.Arrays.{equals,fill}.

Based on decompression speed measurements of the ~100 MiB linux-4.20.5.tar.xz, size-optimized muxzcat.c (on Linux i386) is about 1.439 times faster than muxzcat.java (with java 1.8 compiled for Linux amd64). The C and Java implementations are derived from the same codebase (muxzcat.c, itself derived from the sources files in 7z922.tar.bz2).

muxzcat.java needs Java 1.0.2 (released on 1995-09-16) or any more recent Java, e.g. Java 8.

If you need a tiny decompressor for .gz, .zip and Flate compressed files implemented in C, see https://github.com/pts/pts-zcat .

If you need a tiny extractor and self-extractor for .7z archives implemented in C, see https://github.com/pts/pts-tiny-7z-sfx .

END