ZFS

󰃭 2024-10-24

Difference between scrub and resilver

Lifted from Haravikk on ServerFault.

The main scrubbing and resilvering processes in ZFS are essentially identical – in both cases records are being read and verified, and if necessary written out to any disk(s) with invalid (or missing) data.

Since ZFS is aware of which records a disk should have, it won’t bother trying to read records that shouldn’t exist. This means that during resilvering, new disks will see little or no read activity as there’s nothing to read (or at least ZFS doesn’t believe there is).

This also means that if a disk becomes unavailable and then available again, ZFS will resilver only the new records created since the disk went unavailable. Resilvering happens automatically in this way, whereas scrubs typically have to be initiated (either manually, or via a scheduled command).

There is also a special “sequential resilver” option for mirrored vdevs that can be triggered using zpool attach -s or zpool replace -s – this performs a faster copy of all data without any checking, and initiates a deferred scrub to verify integrity later. This is good for quickly restoring redundancy, but should only be used if you’re confident that the existing data is correct (you run regular scrubs, or scrubbed before adding/replacing).

Finally there are some small differences in settings for scrub and resilver - in general a resilver is given a higher priority than a scrub since it’s more urgent (restoring/increasing redundancy), though due to various factors this may not mean a resilver is faster than a scrub depending upon write speed, number of record copies available etc.

For example, when dealing with a mirror a resilver can be faster since it doesn’t need to read from all disks, but only if the new disk is fast enough (can be written to at least as quickly as the other disk(s) are read from). A scrub meanwhile always reads from all disks, so for a mirror vdev it can be more intensive. For a raidz1 both processes will read from all (existing) disks, so the resilver will be slower as it also requires writing to one, a raidz2 doesn’t need to read all disks so might gain a little speed and so-on.

Basically there’s no concrete answer to cover every setup. 😉

Specifically with regards to the original question:

If you know a disk has failed and want to replace it, and are using a mirrored vdev, then a sequential resilver + scrub (zpool replace -s) will be faster in terms of restoring redundancy and performance, but it’ll take longer overall before you know for sure that the data was fully restored without any errors since you need to wait for the deferred scrub. A regular resilver will take longer to finish copying the data, but is verified the moment it finishes.

However, if you’re talking about repairing data on a disk you still believe to be okay then a scrub is the fastest option, as it will only copy data which fails verification, otherwise the process is entirely reading and checking so it’s almost always going to be faster.

In theory a resilver can be just as fast as a scrub, or even faster (since it’s higher priority), assuming you are copying onto a suitably fast new drive that’s optimised for continuous writing. In practice though that’s usually not going to be the case.


Enter your instance's address