Working with Files

Compress a folder to tar.xz

I came up with this BEFORE tar had the -J option to use XZ compression. This is equivalent though and will also work.

tar -cf - input/ | xz -z -k -6e -T 0 -c -vv - > output.tar.xz

This will run tar on the folder to create a linear archive, then compress that with xz. -T specifies the threads, 0 will use all of them. -6e is the compression. -9 is best, -1 is fastest. Use -6e or anything with e to use more CPU cycles when compressing for marginally better ratio.

7-zip with LZMA2 uses the same compression as XZ. 7-zip is marginally better ratio, but on *NIX systems like Linux, you should use tar.xz since 7zip does not keep file attributes and metadata.

Using zstandard instead

At the time of writing this update, if you use XZ, I suggest you consider zstd. It's great. It can hit the same ratios as XZ while being much faster, especially when decompressing with the higher levels (and XZ is already faster then BZIP). Meanwhile, the lower levels are both better then gzip, but also faster. Levels 1/2 are great since you're almost guaranteed to be bottlenecked by the disk (unless you have a massive RAID array or something) but it's similar to gzip.

You can do something very similar using zstandard as well. Zstd normally wants you to have the whole file rather then stream it to know the size in advance, but streaming as shown is so convenient and doesn't need any extra disk space.

tar -cf - input/ | zstd -22 --ultra - > output.tar.xz

Zstd goes from 1-19 normally, but you can use –ultra to enable up to -22. I have found that -19 is a bit worse then xz -9, but is so much faster, especially with decompression. -22 can sometimes beat xz -9 and is still faster when decompressing.

You can also use –adapt to dynamically change level, but I don't trust this mode after it segfaulted on me while compressing. It also just bumps it up to -22 within a minute anyway even though it's CPU bottlenecked and not I/O… I hope it'll improve since it's neat.

Convert .bin and .cue to .iso

I have a bunch of old family albums on DVDs (because they're old and flash drives were $$$ back then). I transferred them, but some are in .bin/.cue format since they are multitrack. I have no idea how to mount these, but I can mount .iso on Linux so I needed to convert them.

I found this solution that worked: all 3 top solutions worked. thread.

Merge multiple PDF's together

Merging multiple PDF's can be done using several programs. I used to use ImageMagick, but it can destroy the resolution. Instead, ghostscript and pdftk provide nice options.

Using pdftk:

pdfunite in1.pdf in2.pdf out.pdf

Make sure you put out.pdf otherwise it will overwrite the last file! This has no loss in resolution (unlike the ghostscript version).

Cannot unmount USB drive

If you try unmounting a disk as follows (wherever you mounted it)

sudo umount /run/media/username/FlashDrive

and you get an error umount: target is busy, it can be very annoying to find the process that is using something from the drive.

You can use `lsof` to check which processes are accessing the files.

lsof | grep '/run/media/username/FlashDrive'

This will show you which process is using which file, which can be useful.

Checksum entire folders

Generate checksum

To generate/calculate MD5 checksums for an entire folder, we can do this recursively with find.

find -type f -exec md5sum "{}" + > checksums.txt

This will generate the checksum for every file in the folder and save it to a text file.

Technically, MD5 sums aren't the best anymore but it's fast and nothing we're doing needs security anyways, so it's a good quick sanity check.

NOTE: I recommend using SHA-256 or SHA-512 over MD5 for anything important. I have found that SHA-512 is actually faster then SHA-256, so if you have the space, you might as well use it.

find -type f -exec sha512sum "{}" + > checksums.txt

Verify checksum

To verify that all the files are intact, just make sure it matches.

md5sum -c checksums.txt

This will run through and verify everything is good.

Alternatively, if you used SHA-512, then just replace md5sum with sha512sum.

sha512sum -c checksums.txt

Note on hash functions

MD5 is not secure anymore, but I was curious if the speed is a worthwhile tradeoff.

To find out which method is faster, you can use OpenSSL:

openssl speed sha256 sha512 md5

For me, the results are as follows: MD5 > SHA-512 > SHA-256

 The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md5              84098.48k   196370.90k   361047.62k   459072.69k   496273.01k   495352.10k
sha256           42210.06k    94892.37k   165544.29k   200256.39k   211776.94k   212728.10k
sha512           29820.25k   121823.76k   181359.94k   252842.09k   289037.40k   296808.81k

Mount NAS storage in fstab

Create multi-session DVD and burn files

For the odd occasion where I need a DVD for some reason, I had been using K3B. Previously, I knew how to do it with mkisofs and cdrecord, but it's kind of tedious.

Turns out you can also just use growisofs from the command line.

growisofs -Z /dev/sr0 -speed=1 -R -J .

where -Z will start a new session on a blank disc, and -M can append to a past session.

That being said, multi-session discs from Windows don't like to mount on Linux for some reason. I have not found a solution, but photorec can still rescue your data if you have old files you're after. Photorec also works with /dev/sr0 if you have a disc with corrupted FS that you need to recover.

-speed=1 forces the slowest supported write speed. I think the advice is burn at half the supported speed of your disc. I go as slow as possible since I use a semi-broken laptop drive with a slimline SATA power adapter that is duct-taped to my desktop, which is quite vibration prone as well.

-R and -J enable Rocky Ridge and Joilet extensions to have more the 8 uppercase letters for filenames…

Note: for some reason, the first -Z ended up burning and using like 300MB even though I had a couple of MB of documents to burn. Not sure why. K3B DOES burn it properly so there is something wrong here.


Enter your comment:
D A U​ C X