by giis on 9/5/23, 3:07 AM with 164 comments
by istjohn on 9/5/23, 7:23 AM
- Use sane defaults for pool creation. ashift=12, lz4 compression, xattr=sa, acltype=posixacl, and atime=off. Don't even ask me.
- Make encryption just on or off instead of offering five or six options
- Generate the encryption key for me, set up the systemd service to decrypt the pool at start up, and prompt me to back up the key somewhere
- `zfs list` should show if a dataset is mounted or not, if it is encrypted or not, and if the encryption key is loaded or not
- No recursive datasets and use {pool}:{dataset} instead of {pool}/{dataset} to maintain a clear distinction between pools and datasets.
- Don't make me name pools or snapshots. Assign pools the name {hostname}-[A-Z]. Name snapshots {pool name}_{datetime created} and give them numerical shortcuts so I never have to type that all out
- Don't make me type disk IDs when creating pools. Store metadata on the disk so ZFS doesn't get confused if I set up a pool with `/dev/sda` and `/dev/sdb` references and then shuffle around the drives
- Always use `pv` to show progress
- Automatically set up weekly scrubs
- Automatically set up hourly/daily/weekly/monthly snapshots and snapshot pruning
- If I send to a disk without a pool, ask for confirmation and then create a new single disk pool for me with the same settings as on the sending pool
- collapse `zpool` and `zfs` into a single command
- Automatically use `--raw` when sending encrypted datasets, default to `--replicate` when sending, and use `-I` whenever possible when sending
- Provide an obvious way to mount and navigate a snapshot dataset instead of hiding the snapshot filesystem in a hidden directory
by vermaden on 9/5/23, 7:55 AM
- get to know the difference between zpool-attach(8) and zpool-replace(8).
- this one will tell you where your space is used:
# zfs list -t all -o space
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
(...)
- ZFS Boot Environments is the best feature to protect your OS before major changes/upgrades--- this may be useful for a start: https://is.gd/BECTL
- this command will tell you all history about ZFS pool config and its changes:
# zpool history poolname
History for 'poolname':
2023-06-20.14:03:08 zpool create poolname ada0p1
2023-06-20.14:03:08 zpool set autotrim=on poolname
2023-06-20.14:03:08 zfs set atime=off poolname
2023-06-20.14:03:08 zfs set compression=zstd poolname
2023-06-20.14:03:08 zfs set recordsize=1m poolname
(...)
- the guide misses one important info: --- you can create 3-way mirror - requires 3 disks and 2 may fail - still no data lost
--- you can create 4-way mirror - requires 4 disks and 3 may fail - still no data lost
--- you can create N-way mirror - requires N disks and N-1 may fail - still no data lost
(useful when data is most important and you do not have that many slots/disks)
by customizable on 9/5/23, 7:35 AM
by qwertox on 9/5/23, 5:50 AM
[0] https://docs.freebsd.org/en/books/handbook/zfs/
[1] https://pthree.org/2012/04/17/install-zfs-on-debian-gnulinux...
by philsnow on 9/5/23, 5:42 AM
(I realize now after writing it that maybe snapchat should have occurred to me first, but I have never used it)
by tomxor on 9/5/23, 5:15 AM
I've been using ZFS in combination with rsync for backups for a long time, so I was fairly comfortable with it... and it all worked out, but it was a way bigger time sink than I expected - because I wanted to do it right - and there is a lot of misleading advice on the web, particularly when it comes to running databases and replication.
For databases (you really should at minimum do basic tuning like block size alignment), by far the best resource I found for mariadb/innoDB is from the lets encrypt people [0]. They give reasons for everything and cite multiple sources, which is gold. If you search around the web elsewhere you will find endless contradicting advice, anecdotes and myths that are accompanied with incomplete and baseless theories. Ultimately you should also test this stuff and understand everything you tune (it's ok to decide to not tune something).
For replication, I can only recommend the man pages... yeah, really! ZFS gives you solid replication tools, but they are too agnostic, they are like git pluming, they don't assume you're going to be doing it over SSH (even though that's almost always how it's being used)... so you have to plug it together yourself, and this feels scary at first, especially because you probably want it to be automated, which means considering edge cases... which is why everyone runs to something like syncoid, but there's something horrible I discovered with replication scripts like syncoid, which is that they don't use ZFS's send --replication mode! They try to reimplement it in perl, for "greater flexibility", but incompletely. This is maddening when you are trying to test this stuff for the first time and find that all of the encryption roots break when you do a fresh restore, and not all dataset properties are automatically synced. ZFS takes care of all of this if you simply use the build in recursive "replicate" option. It's not that hard to script manually once you commit to it, just keep it simple, don't add a bunch of unnecessary crap into the pipeline like syncoid does, (they actually slow it down if you test), just use pv to monitor progress and it will fly.
I might publish my replication scripts at some point because I feel like there are no good functional reference scripts for this stuff that deal with the basics without going nuts and reinventing replication badly like so many others.
by guerby on 9/5/23, 5:22 AM
My only surprise was volblocksize default which is pretty bad for most RAIDZ configuration: you need to increase it to avoid loosing 50% of raw disk space...
Articles touching this topic :
https://openzfs.github.io/openzfs-docs/Basic%20Concepts/RAID...
https://www.delphix.com/blog/zfs-raidz-stripe-width-or-how-i...
And you end up on one of the ZFS "spreadsheet" out there:
ZFS overhead calc.xlsx https://docs.google.com/spreadsheets/d/1tf4qx1aMJp8Lo_R6gpT6...
RAID-Z parity cost https://docs.google.com/spreadsheets/d/1pdu_X2tR4ztF6_HLtJ-D...
by tweetle_beetle on 9/5/23, 8:48 AM
The documentation in question was a PowerPoint presentation with difficult to read styling, somewhat evangelical language, lots of assumptions about knowledge and it was not regularly updated. It was vague on how much RAM was required, mainly just focused on having as much as possible. Needless to say I ignored all the red flags about the technology, the hype and my own knowledge and lost a load of data. Lots of lessons learnt.
by unethical_ban on 9/5/23, 3:39 PM
- All redundancy in ZFS is built in the vdev layer. Zpools are created with one or more vdevs, and no matter what, if you lose any single vdev in a zpool, the zpool is permanently destroyed.
- Historically RAIDZs (parity RAIDs) cannot be expanded by adding disks. The only way to grow a RAIDZ is to replace each disk in the array one at a time with a larger disk (and hope no disks fail during the rebuild). So in my very amateur opinion, I would only consider doing a RAIDZ if it is something like a RAIDZ2 or 3 with a large number of disks. For n<=6 and if the budget can stand it, I would do several mirrored vdevs. (Again as an amateur I am less familiar with RW performance metrics of various RAIDs so do more research for prod).
by mastax on 9/5/23, 9:07 AM
So there aren't any errors in files. There aren't any errors in devices. There aren't any errors detected in scrub(?). And yet at runtime I get a dozen new "errors" showing up in zpool status per day. How?
by totetsu on 9/5/23, 4:25 AM
- if you want to copy files for example and connect your drive to another system and mount your zpool there, it sets some pool membership value on the file system and when you put it back in your system it won’t boot unless you set it back. Which involved chroot
- the default settings I had made snapshot every time I apt installed something, because that snap shot included my home drive when I deleted big files thereafter I didn’t get any free space back until i figued out what was going on and arbitrarily deleted some old snapshots
- you can’t just make a swap file and use it,
by idatum on 9/5/23, 5:14 AM
It's #3 where I need to do some more research/work. I need to spend some time sending snapshots/diffs to cloud blob storage and make sure I can restore. Yes, I know there is rsync.net.
Any experiences to share?
by asicsp on 9/5/23, 4:48 AM
by crawsome on 9/5/23, 1:19 PM
Haha, The only part of maintenance that I need to look up every time I do it is replacing a faulty hard drive.
Even this guide skips that.
by znpy on 9/5/23, 9:00 AM
by dontupvoteme on 9/5/23, 4:05 AM
(Hey looks like it's a sore spot!)