Updating a ZFS Mirror
Categories: FreeBSD, Sysadmin.
A few days ago, while I was on the phone, my machine experienced a kernel panic. The backtrace pointed a problem somewhere in the swap management code. I was on a hurry at that time and rebooted the machine without taking the time dig in the problem deeper.
On the next day, I eventually realised that an hard disk was logically missing on the system and the ZFS mirror it was belonging to was working in a degraded mode. This disk holding a swap partition, the panic quite makes sense: some data was stored there and could not be paged-out anymore.
# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT data 294G 68,2G 226G 23% 1.25x DEGRADED - tank 1,81T 300G 1,52T 16% 1.06x ONLINE -
# zpool status data pool: data state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scan: none requested config: NAME STATE READ WRITE CKSUM data DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 gptid/36711e52-a69e-11de-8adf-0018f38af467 ONLINE 0 0 0 15152536002702365387 UNAVAIL 0 0 0 was /dev/gptid/602da1ae-c474-11de-960d-0008a14dbca1 errors: No known data errors
This disk already had problems in the past and I even had to improve sysutils/smartmontools periodic script to take into account SMART attributes that have been failing at some point in the past but have recovered since that time on that disk. This time, the disk is really dead, so no other choice than changing it. Hopefully, I have a brunch of spare heard disks on the shelve, so I took two 500 GB disks to replace the two 320 GB of the degraded ZFS pool.
I first replaced the broken disk with a new one, identified it using geom(8) and partitioned using basically the same settings I used when installing FreeBSD on full-ZFS:
# geom disk list ada1 Geom name: ada1 Providers: 1. Name: ada1 Mediasize: 500107862016 (465G) Sectorsize: 512 Mode: r2w2e3 descr: ST3500418AS ident: (null) fwsectors: 63 fwheads: 16 # geom part create -s GPT ada1 ada1 created # geom part add -s 128 -t freebsd-boot ada1 ada1p1 added # geom part add -s 4G -t freebsd-swap ada1 ada1p2 added # geom part add -t freebsd-zfs ada1 ada1p3 added # geom part show ada1 => 34 976773101 ada1 GPT (465G) 34 128 1 freebsd-boot (64k) 162 8388608 2 freebsd-swap (4.0G) 8388770 968384365 3 freebsd-zfs (461G) # geom part bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1 bootcode written to ada1
I then replaced the unavailable ZFS partition with the new one:
# zpool replace data 15152536002702365387 ada1p3 Make sure to wait until resilver is done before rebooting. If you boot from pool 'data', you may need to update boot code on newly attached disk 'ada1p3'. Assuming you use GPT partitioning and 'da0' is your new boot disk you may use the following command: gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0
… waited a few moments for resilvering to finish:
% zpool status data pool: data state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Jan 2 19:26:44 2012 330M scanned out of 68,2G at 4,07M/s, 4h44m to go 329M resilvered, 0,47% done config: NAME STATE READ WRITE CKSUM data DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 gptid/36711e52-a69e-11de-8adf-0018f38af467 ONLINE 0 0 0 replacing-1 UNAVAIL 0 0 0 15152536002702365387 UNAVAIL 0 0 0 was /dev/gptid/602da1ae-c474-11de-960d-0008a14dbca1 ada1p3 ONLINE 0 0 0 (resilvering) errors: No known data errors
… then shut the system down, replaced the working 320 GB disk by a new 500 GB one and booted into FreeBSD again:
% zpool status data pool: data state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scan: resilvered 68,2G in 1h58m with 0 errors on Mon Jan 2 21:25:42 2012 config: NAME STATE READ WRITE CKSUM data DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 3635039039460500206 UNAVAIL 0 0 0 was /dev/gptid/36711e52-a69e-11de-8adf-0018f38af467 ada1p3 ONLINE 0 0 0 errors: No known data errors
Same story with the other disk:
geom part create -s GPT ada0 ada0 created # geom part add -s 128 -t freebsd-boot ada0 ada0p1 added # geom part add -s 4G -t freebsd-swap ada0 ada0p2 added # geom part add -t freebsd-zfs ada0 ada0p3 added # geom part show ada0 => 34 976773101 ada0 GPT (465G) 34 128 1 freebsd-boot (64k) 162 8388608 2 freebsd-swap (4.0G) 8388770 968384365 3 freebsd-zfs (461G) # geom part bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0 bootcode written to ada0 # zpool replace data 3635039039460500206 ada0p3 Make sure to wait until resilver is done before rebooting. If you boot from pool 'data', you may need to update boot code on newly attached disk 'ada0p3'. Assuming you use GPT partitioning and 'da0' is your new boot disk you may use the following command: gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0
When done, I saw that the available space on the data zpool was still the same:
# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT data 294G 65,3G 229G 22% 1.19x ONLINE - tank 1,81T 300G 1,52T 16% 1.08x ONLINE -
This is due to the autoexpand property set to off by default and that should be set to on before replacing disks if this feature is desired.
# zpool set autoexpand=on data
Hopefully, it is possible to use the zpool(8)'s online command to make ZFS take into account the extra space available for the pools:
# zpool online -e data ada0p3
# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT data 460G 65,3G 395G 14% 1.20x ONLINE - tank 1,81T 300G 1,52T 16% 1.06x ONLINE -
Comments
On January 4, 2012, Freddie Cash wrote:
Just a note: you don't need to use zpool replace on mirror vdevs. Instead, just zpool detach the degraded disk from the vdev (thus turning it into a single disk vdev). And zpool attach the new disk to the remaining disk (thus turning it back into a mirror vdev).
zpool replace is only needed for raidz vdevs.
On January 5, 2012, Romain Tartière wrote:
You are right: there is no need to zpool replace in this case. In fact, the man page indicates that this command does what you say in a single step (the only difference is the order of operations, and it's not critical):
zpool replace [-f] pool old_device [new_device] Replaces old_device with new_device. This is equivalent to attach- ing new_device, waiting for it to resilver, and then detaching old_device.Let's face it: I am a lazy sysadmin ;-)
On August 5, 2012, Firesock Serwalek wrote:
There may actually be an advantage to doing the replace method (and the order) over detach/attach. When you detach, certain details are removed from the disk, so if something goes wrong during the resilver, you can't just attempt to pop it back in. Presumably this is the reason behind the ordering of operations as well.
More details here:
http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg15620.html