XZ Utils is like the traditional gzip and bzip2 stream compressor. The file extension is .xz (also .tar.xz and .txz). The XZ Utils implementation is a successor of LZMA Utils, which uses a container format now considered legacy, but XZ Utils handles this legacy format. The legacy format is specified by the suffix .lzma (also .tar.lzma).
7zip is more like the .zip format, in the sense that an archive can contain multiple compressed files. The program also handles multiple formats, .zip as well as .tar.gz and .tar.bz2. Its Unix implementation is called p7zip. Use 7zip if you see .7z (also .tar.7z).
A lesser known implementation, lzip, is implemented in the style of gzip and zlib, and produces .lz files (also .tar.lz and .tlz). This is not to be confused with .lzma files produced by LZMA Utils, as these two formats are not compatible.
According to Google search today, there are 27,000 results for "tar.xz," 23,700 results for "tar.7z," 22,200 results for "tar.lzma," and only 2,200 results for "tar.lz." GNU tar recognizes suffixes for all the formats above except .7z, and assigns -J option for .xz files.
The following is an informal benchmark run. I compiled these programs using the default Makefile's compiler optimization flags without taking note of what they are, so I might be comparing apples and oranges.
$ time zcat hugefile.gz | 7za a -si hugefile.7z ... real 20m2.066s user 31m17.788s sys 0m14.417s7zip seems to have implemented multi-threading, and is able to scale to 150% CPU time.
$ time zcat hugefile.gz | xz > hugefile.xz real 38m5.059s user 38m6.863s sys 0m10.712sxv promises to implement multi-threading support in the future.
$ time zcat hugefile.gz | lzip > hugefile.lz real 43m59.658s user 43m50.838s sys 0m7.064sThese are the resulting file sizes.
$ ls -l total 699264 -rw------- 1 liulk grad2 147791612 Apr 12 23:56 hugefile.7z -rw------- 1 liulk grad2 238094571 Apr 12 22:00 hugefile.gz -rw------- 1 liulk grad2 111189035 Apr 13 00:09 hugefile.lz -rw------- 1 liulk grad2 112989212 Apr 13 00:03 hugefile.xzThe difference is not that great. The implementations seem to be of comparable quality, sacrificing more time in order to achieve smaller files. My personal preference would be xz at the moment, by virtue that multi-threaded support is promised, that GNU tar has a dedicated option -J for the format, and that xz comes with a suite of utilities (unxz, xzless, xzcat, etc.) analogous to gzip counterparts (gunzip, zless, zcat, etc.).
Update (May 19): I just noticed today that lzip has a parallel implementation plzip that can scale to multiple processors. The timing result is as follows:
$ time xzcat hugefile.xz | plzip > hugefile.lz real 6m46.324s user 40m7.743s sys 0m19.721sIt scaled to all the available CPU on the shared computing node I tested on.
No comments:
Post a Comment