January 4, 2012

Updating a ZFS Mirror

Categories: FreeBSD, Sysadmin.

A few days ago, while I was on the phone, my machine experienced a kernel panic. The backtrace pointed a problem somewhere in the swap management code. I was on a hurry at that time and rebooted the machine without taking the time dig in the problem deeper.

On the next day, I eventually realised that an hard disk was logically missing on the system and the ZFS mirror it was belonging to was working in a degraded mode. This disk holding a swap partition, the panic quite makes sense: some data was stored there and could not be paged-out anymore.

# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
data   294G  68,2G   226G    23%  1.25x  DEGRADED  -
tank  1,81T   300G  1,52T    16%  1.06x  ONLINE  -
# zpool status data
  pool: data
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
  scan: none requested
config:

NAME                                            STATE     READ WRITE CKSUM
data                                            DEGRADED     0     0     0
  mirror-0                                      DEGRADED     0     0     0
    gptid/36711e52-a69e-11de-8adf-0018f38af467  ONLINE       0     0     0
    15152536002702365387                        UNAVAIL      0     0     0  was /dev/gptid/602da1ae-c474-11de-960d-0008a14dbca1

errors: No known data errors

This disk already had problems in the past and I even had to improve sysutils/smartmontools periodic script to take into account SMART attributes that have been failing at some point in the past but have recovered since that time on that disk. This time, the disk is really dead, so no other choice than changing it. Hopefully, I have a brunch of spare heard disks on the shelve, so I took two 500 GB disks to replace the two 320 GB of the degraded ZFS pool.

I first replaced the broken disk with a new one, identified it using geom(8) and partitioned using basically the same settings I used when installing FreeBSD on full-ZFS:

# geom disk list ada1
Geom name: ada1
Providers:
1. Name: ada1
   Mediasize: 500107862016 (465G)
   Sectorsize: 512
   Mode: r2w2e3
   descr: ST3500418AS
   ident: (null)
   fwsectors: 63
   fwheads: 16

# geom part create -s GPT ada1
ada1 created
# geom part add -s 128 -t freebsd-boot ada1
ada1p1 added
# geom part add -s 4G -t freebsd-swap ada1
ada1p2 added
# geom part add -t freebsd-zfs ada1
ada1p3 added
# geom part show ada1
=>       34  976773101  ada1  GPT  (465G)
         34        128     1  freebsd-boot  (64k)
        162    8388608     2  freebsd-swap  (4.0G)
    8388770  968384365     3  freebsd-zfs  (461G)

# geom part bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1
bootcode written to ada1

I then replaced the unavailable ZFS partition with the new one:

# zpool replace data 15152536002702365387 ada1p3
Make sure to wait until resilver is done before rebooting.

If you boot from pool 'data', you may need to update
boot code on newly attached disk 'ada1p3'.

Assuming you use GPT partitioning and 'da0' is your new boot disk
you may use the following command:

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0

… waited a few moments for resilvering to finish:

% zpool status data
  pool: data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Jan  2 19:26:44 2012
    330M scanned out of 68,2G at 4,07M/s, 4h44m to go
    329M resilvered, 0,47% done
config:

	NAME                                            STATE     READ WRITE CKSUM
	data                                            DEGRADED     0     0     0
	  mirror-0                                      DEGRADED     0     0     0
	    gptid/36711e52-a69e-11de-8adf-0018f38af467  ONLINE       0     0     0
	    replacing-1                                 UNAVAIL      0     0     0
	      15152536002702365387                      UNAVAIL      0     0     0  was /dev/gptid/602da1ae-c474-11de-960d-0008a14dbca1
	      ada1p3                                    ONLINE       0     0     0  (resilvering)

errors: No known data errors

… then shut the system down, replaced the working 320 GB disk by a new 500 GB one and booted into FreeBSD again:

% zpool status data
  pool: data
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
	the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
  scan: resilvered 68,2G in 1h58m with 0 errors on Mon Jan  2 21:25:42 2012
config:

	NAME                     STATE     READ WRITE CKSUM
	data                     DEGRADED     0     0     0
	  mirror-0               DEGRADED     0     0     0
	  3635039039460500206    UNAVAIL      0     0     0  was /dev/gptid/36711e52-a69e-11de-8adf-0018f38af467
	    ada1p3               ONLINE       0     0     0

errors: No known data errors

Same story with the other disk:

geom part create -s GPT ada0
ada0 created
# geom part add -s 128 -t freebsd-boot ada0
ada0p1 added
# geom part add -s 4G -t freebsd-swap ada0
ada0p2 added
# geom part add -t freebsd-zfs ada0
ada0p3 added
# geom part show ada0
=>       34  976773101  ada0  GPT  (465G)
         34        128     1  freebsd-boot  (64k)
        162    8388608     2  freebsd-swap  (4.0G)
    8388770  968384365     3  freebsd-zfs  (461G)

# geom part bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
bootcode written to ada0
# zpool replace data 3635039039460500206 ada0p3
Make sure to wait until resilver is done before rebooting.

If you boot from pool 'data', you may need to update
boot code on newly attached disk 'ada0p3'.

Assuming you use GPT partitioning and 'da0' is your new boot disk
you may use the following command:

	gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0

When done, I saw that the available space on the data zpool was still the same:

# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
data   294G  65,3G   229G    22%  1.19x  ONLINE  -
tank  1,81T   300G  1,52T    16%  1.08x  ONLINE  -

This is due to the autoexpand property set to off by default and that should be set to on before replacing disks if this feature is desired.

# zpool set autoexpand=on data

Hopefully, it is possible to use the zpool(8)'s online command to make ZFS take into account the extra space available for the pools:

# zpool online -e data ada0p3
# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
data   460G  65,3G   395G    14%  1.20x  ONLINE  -
tank  1,81T   300G  1,52T    16%  1.06x  ONLINE  -

Comments

On January 4, 2012, Freddie Cash wrote:

Just a note: you don't need to use zpool replace on mirror vdevs. Instead, just zpool detach the degraded disk from the vdev (thus turning it into a single disk vdev). And zpool attach the new disk to the remaining disk (thus turning it back into a mirror vdev).

zpool replace is only needed for raidz vdevs.

On January 5, 2012, Romain Tartière wrote:

You are right: there is no need to zpool replace in this case. In fact, the man page indicates that this command does what you say in a single step (the only difference is the order of operations, and it's not critical):

zpool replace [-f] pool old_device [new_device] Replaces old_device with new_device. This is equivalent to attach- ing new_device, waiting for it to resilver, and then detaching old_device.

Let's face it: I am a lazy sysadmin ;-)

On August 5, 2012, Firesock Serwalek wrote:

There may actually be an advantage to doing the replace method (and the order) over detach/attach. When you detach, certain details are removed from the disk, so if something goes wrong during the resilver, you can't just attempt to pop it back in. Presumably this is the reason behind the ordering of operations as well.

More details here:
http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg15620.html

Comments RSS feed | Leave a Reply…

top