scripts:backups_and_data

This is an old revision of the document!


Backups and Data

I use Borg for my current active backup on my laptop. Since the laptop is only schoolwork, there isn't a ton of other files to copy.

Borg has a few cool features:

  • Incremental backups only copy the differences
  • Ability to mount the backups easily for recovery

For my desktop, I use good old tar files generally for system setup. All important data on this system is generally copied to one of many external drives.

I used to use Syncthing, to sync files across my systems. I later switched to unison over ssh through my own VPN. I now just use rsync for simplicity and speed.

I have various drives, some old, some new holding backups. Critical things like family documents are redundant on my drive, at least 2 externals, and an encrypted copy on Backblaze B2.

Old drives with bad partitions are generally retired and become permanent backups after I load an important collection and leave it. I try to power them up twice a year to ensure they still work.

I use PAR2 to create parity files. I do this for both TAR and ZPAQ. I have 2 options: use files/folders and if bitrot happens, only lose 1 file, or use archives/containers, where bitrot can kill everything. However, using parity with 5% prevention should be enough to protect the entire archive.

If I have to encrypt anything, I use AES256 with GPG. If there is a fault in AES, lets be real, everyone will be screwed anyways.

There are a few option. Par2 (parchive) seems to be the standard here. Of course ZFS and BTRFS can keep track of parity bits in RAID setups, and ZFS will automatically correct the bit errors from the parity drive (as long as you don't pull an LTT and periodically scrub the drives).

There is also parity mode with RAR and Dvdisaster, though I have limited experience with both. Par2 is kind-of inefficient with the overhead per file, and it was traditionally run on a single tar or zip, etc, not multiple files. Par2 supports multiple files (inefficiently). I still rather par2 a SquashFS or ZPAQ file with the space savings of those plus single file mode is more efficient.

When trying the multi file archive, I was annoyed that it would add the Digikam XMP sidecar files (used to avoid writing EXIF into videos and RAW photos), which is a waste of space.

So, instead of giving a path, you can give a list of filenames and filter the larger files (or by name like *.ARW or *.NEF).

par2create -v -m4096 -b32768 -c1023 -t24 -R rec -- (find -type f -size +200k)

This is intended for fish shell, hence no $. This uses a max of 4G of memory, uses the maximum allowed 32768 blocks (meaning max 32k files), with 1023 parity blocks (allowing us to lose 1023 of the 32768 original blocks and still being able to recover). -t24 uses 24 threads, change this for your system. rec is the filename and – marks subsequent arguments as file names (put a path here instead to just do every file).

When verifying, I also add -T1 to only do 1 file at a time. This is to avoid thrashing on my spinning rust hard drives. Might not matter on SSDs. It's especially important on optical media, where seeks are expensive, and the media is best treated as a linear archive like tape. (Yes it's random access but that laser carriage is slow vs a HDD head and it's a linear spiral groove anyway).

I used to use XZ for compressing tar files. As of 2021-01, I have switched to ZStandard since it offers the same or slightly better ratio with -22, with MUCH faster decompression.

I will try to do a larger writeup and comparison in the future.

For compressing large amounts of files for archiving, where I won't need frequent random access, I use zpaq -m4. m5 is better but also takes forever. ZPAQ is particularly good if there are lots of duplicates.

For folders with large duplicates I rarely need, I am testing out dwarfs, since it hits a good ratio with -l7 (which uses zstd -22) and allows random access. Anything above -7 segfaults for me.

If you are low on space and don't care about CPU time, it's a journaled archive. Very good compression ratio.

-m3 works fine for maybe-already compressed media like photos and videos you don't want to spend a year on. -m4 is great for more text based files.

zpaq a files.zpaq Documents/ -m4

-m5 takes an eternity, and is technically better but generally not worth it. (Bonus points for the RAM usage on a 16c32t system…)

These are read-only filesystems. you make them, and it's a compressed file, but it can be mounted.

Squashfs: pretty standard. used on many linux installer. similar ratio to tar. can use zstandard. not best radio, but deduplicated, and allows random reads since it can be mounted.

For backup, borg also allows mounting, is deduplicated and compressed. Sqsh is just a “read only image” equivalent with compression.

Dwarfs: found it on github (by mhx?). faster then zpaq, better then squashfs. have not vetted source code though. use at your own risk. it is a cool FS though.

This is something I didn't know existed until I tried doing it manually. Basically, say you have 5000 photos to back up.

Say you want to send them to older family members with DVDs (yes, I know, optical media in 2021, give me a break).

DVDs can fit 4.4 GiB each, so you want to take advantage of that and split the files evenly. Doing it by hand sucks.

fpart can take the folders and make file lists for several partitions, either by target number of partitions, OR, by space.

eg: to split into 5 partitions with the file lists named list.0, so on

fpart -n 5 -o list -v .

Problem is, it doesn't really move the files. If you don't care about folder structure (ie: intended as slideshow or something)

sed 's/^ *//' < list.0 | xargs -d '\n' mv --backup=t -v -t folder0

Where list.0 is the text list, and folder0 is the target.

The –backup=t makes sure that if there are duplicate file names, it will rename one automatically and not overwrite.

NOTE: THIS REMOVES THE DIRECTORY STRUCTURE, AND LUMPS EVERYTHING IN ONE FOLDER.

This is easy enough to then burn onto a CD/DVD with K3B. While you're at it, do yourself a favor and use PAR2 to add parity.

DVDs will get scratched and I currently leave ~5% parity to recover files.

Enter your comment:
L Y F W Y
 
  • scripts/backups_and_data.1650959015.txt.gz
  • Last modified: 2022-04-26 07:43
  • by tony