[kwlug-disc] Docker Host Appliance
Chris Irwin
chris at chrisirwin.ca
Thu Jan 19 02:38:00 EST 2023
On Tue, Jan 17, 2023 at 03:45:20PM -0500, Doug Moen wrote:
>The advice for tuning OpenZFS performance for database and VM workloads
>is here:
>
>https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Workload%20Tuning.html#virtual-machines
>
>First, you create a "dedicated dataset" (dbs) or you store the VM image
>in a "zvol or raw file to avoid overhead".
What "overhead" are they attempting to avoid by using RAW files or
zvols? Because there's potentially downsides with those as well.
If it's just a recommendation for best-case performance, fine. But
generally, using a proper virtual disk format that allows your
hypervisor to take snapshots is also important.
>Then you tune the blocksize of the dedicated DB or VM storage to match
>the blocksize of the database instance or the VM guest filesystem (to
>avoid overhead from partial record modification).
Anyways, the fact that there's ZFS docs for tuning VM storage for
performance doesn't answer this earlier question:
>> People claim ZFS handles databases/VMs better than btrfs, but I don't
>> really see how since it appears to use the same COW semantics.
People specifically suggest disabling COW for VMs on BTRFS because your
files will become fragmented. That's the reason.
This is technically true, but maybe should be weighed similarly to
"mount filesystems with noatime because SSDs only have finite writes".
From what I understand, ZFS uses a similar "COW" semantics to BTRFS,
which means writes are never done in-place. Writes are always written
elsewhere, and the file is updated to reference those new changes.
Here's an example:
Assume you're starting with a 10GB contiguous, non-fragmented, RAW VM
file. And for arguments sake, assume you have at least one snapshot of
this dataset or zvol (because of course you do, you're using ZFS.
They're free).
If you start making changes to that file (system upgrades in the VM,
etc), those writes are written elsewhere on the ZFS filesystem, and the
file is updated to reference the new data blocks. We know the original
contents are intact, because we can inspect them via the snapshot.
Now fast forward a few years of this VM doing it's thing, plus years
worth of rolling hourly/daily/weekly/monthly snapshots.
Does this not cause fragmentation on the storage? Your live file will be
nowhere near contiguous on the physical disk.
If "No": Please explain how ZFS avoids this. Because I haven't seen it
discussed.
If "Yes": This is the same fragmentation issue that people warn about
with BTRFS, causing them to say it can't do VMs, or you should disable
COW, etc.
I'm not saying fragmentation of storage is the end of the world. Read
ahead exists, caches exist. I'm just confused why BTRFS has a reputation
of doing VMs poorly when, as far as I can tell, ZFS has the same
behaviour.
The only reason I can possibly think of that this wouldn't be an issue
on ZFS, is because it is so massively, massively RAM hungry and just
brute-forces the problem with very aggressive read-ahead and caching.
BTRFS, on the other hand, relies on the kernel's built-in caching.
--
Chris Irwin
email: chris at chrisirwin.ca
xmpp: chris at chrisirwin.ca
web: https://chrisirwin.ca
More information about the kwlug-disc
mailing list