scripts:files:compress_directory_into_tarxz

I came up with this BEFORE tar had the -J option to use XZ compression. This is equivalent though and will also work.

tar -cf - input/ | xz -z -k -6e -T 0 -c -vv - > output.tar.xz

This will run tar on the folder to create a linear archive, then compress that with xz. -T specifies the threads, 0 will use all of them. -6e is the compression. -9 is best, -1 is fastest. Use -6e or anything with e to use more CPU cycles when compressing for marginally better ratio.

7-zip with LZMA2 uses the same compression as XZ. 7-zip is marginally better ratio, but on *NIX systems like Linux, you should use tar.xz since 7zip does not keep file attributes and metadata.

At the time of writing this update, if you use XZ, I suggest you consider zstd. It's great. It can hit the same ratios as XZ while being much faster, especially when decompressing with the higher levels (and XZ is already faster then BZIP). Meanwhile, the lower levels are both better then gzip, but also faster. Levels 1/2 are great since you're almost guaranteed to be bottlenecked by the disk (unless you have a massive RAID array or something) but it's similar to gzip.

You can do something very similar using zstandard as well. Zstd normally wants you to have the whole file rather then stream it to know the size in advance, but streaming as shown is so convenient and doesn't need any extra disk space.

tar -cf - input/ | zstd -22 --ultra - > output.tar.xz

Zstd goes from 1-19 normally, but you can use –ultra to enable up to -22. I have found that -19 is a bit worse then xz -9, but is so much faster, especially with decompression. -22 can sometimes beat xz -9 and is still faster when decompressing.

You can also use –adapt to dynamically change level, but I don't trust this mode after it segfaulted on me while compressing. It also just bumps it up to -22 within a minute anyway even though it's CPU bottlenecked and not I/O… I hope it'll improve since it's neat.

  • scripts/files/compress_directory_into_tarxz.txt
  • Last modified: 2023-03-27 02:28
  • by Tony