After yesterday’s guide on setting up a Solaris NAS, I figure the next logical questions would be:
- How do I change out disks which have failed?
- How do I change out smaller disks for larger ones?
- Can I add more disks to my pool?
All three questions are quite easily answered, and can, for the most part, be done with a single tool.
First up, what happens when a disk fails. I’ve hot-removed one of the virtual hard disks from my array to simulate a failure and see what Solaris does.
root@opensolaris:/naspool/movies# zpool status naspool pool: naspool state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scrub: none requested config: NAME STATE READ WRITE CKSUM naspool DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c8t4d0 FAULTED 3 953 0 too many errors errors: No known data errors
Oh, dear. It looks like c8t4d0 has faulted and the pool is currently in a degraded state. But is our data still there?
root@opensolaris:/naspool/movies# ls -lah total 29G drwxr-xr-x 2 astro root 5 2009-10-10 10:01 . drwxr-xr-x 5 astro root 5 2009-10-09 16:01 .. -rw------T 1 root root 10G 2009-10-10 09:55 10gigfile -rw------T 1 root root 10G 2009-10-10 09:58 10gigfile2 -rw------T 1 root root 9.0G 2009-10-10 10:01 9gigfile
Ok, my data is safe for the time being, but with one disk down I don’t have any room for error. I’ll have to replace that hard disk with a new one. First thing I’ll have to do is shut the system down so that we can add the new disk. While I’m adding a disk in VMWare’s hardware interface, imagine yourself crawling under your desk with a screwdriver and an anti static strap.
Righto, so let’s say I had another 10Gb disk lying around, and I’ve popped it in to my server. Now all I need to do is tell Solaris that it’s there and that it should be used to replace the failed drive in my naspool array. So boot the system back up, log in, grab a command line, become root, and…
root@opensolaris:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c8t0d0 /pci@0,0/pci15ad,1976@10/sd@0,0 1. c8t1d0 /pci@0,0/pci15ad,1976@10/sd@1,0 2. c8t2d0 /pci@0,0/pci15ad,1976@10/sd@2,0 3. c8t3d0 /pci@0,0/pci15ad,1976@10/sd@3,0 4. c8t5d0 /pci@0,0/pci15ad,1976@10/sd@5,0 Specify disk (enter its number):
Ok, our new disk is c8t5d0, the next SCSI disk in the chain after the old failed disk. Let’s use zpool to replace c8t4d0 with c8t5d0.
root@opensolaris:~# zpool replace naspool c8t4d0 c8t5d0
Tough, huh? Ok, so how’s naspool looking?
root@opensolaris:~# zpool status naspool pool: naspool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 4.26% done, 0h3m to go config: NAME STATE READ WRITE CKSUM naspool DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 replacing DEGRADED 0 0 45 c8t4d0 FAULTED 0 0 0 too many errors c8t5d0 ONLINE 0 0 0 25.3M resilvered errors: No known data errors
So naspool still degraded, but it is ‘resilvering‘ the information on to the new disk – copying all of the data and parity info so that we’ll be back to a fully redundant state. As it’s running, we can monitor it:
root@opensolaris:~# zpool status naspool pool: naspool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h1m, 79.09% done, 0h0m to go config: NAME STATE READ WRITE CKSUM naspool DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 replacing DEGRADED 0 0 67 c8t4d0 FAULTED 0 0 0 too many errors c8t5d0 ONLINE 0 0 0 4.71G resilvered errors: No known data errors
And when it’s finished, we’ll see that the resilver is complete:
root@opensolaris:~# zpool status naspool pool: naspool state: ONLINE scrub: resilver completed after 0h2m with 0 errors on Sat Oct 10 10:28:01 2009 config: NAME STATE READ WRITE CKSUM naspool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 6.73G resilvered errors: No known data errors
As you can see, resilvering disks is nice and quick. My array is now fully redundant again, and can suffer another disk failure without missing a beat. So that covers how to replace a failed disk, what’s next?
The next step is to replace disks with larger ones to increase capacity on our array.
The method for doing this is pretty much identical to the way we replaced a failed disk – add a new disk, then tell Solaris to replace one with another. For example, I’ll add in a new 50Gb disk:
Then I’ll fire up Solaris, log in and check format to see what its new ID is.
root@opensolaris:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c8t0d0 /pci@0,0/pci15ad,1976@10/sd@0,0 1. c8t1d0 /pci@0,0/pci15ad,1976@10/sd@1,0 2. c8t2d0 /pci@0,0/pci15ad,1976@10/sd@2,0 3. c8t3d0 /pci@0,0/pci15ad,1976@10/sd@3,0 4. c8t5d0 /pci@0,0/pci15ad,1976@10/sd@5,0 5. c9t0d0 /pci@0,0/pci15ad,790@11/pci15ad,1976@3/sd@0,0 Specify disk (enter its number): ^C
So there it is as c9t0d0. Let’s replace the first disk in the array with this new 50Gb monster.
root@opensolaris:~# zpool replace naspool c8t1d0 c9t0d0 root@opensolaris:~# zpool status pool: naspool state: ONLINE scrub: resilver completed after 0h2m with 0 errors on Sun Oct 11 08:53:53 2009 config: NAME STATE READ WRITE CKSUM naspool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c9t0d0 ONLINE 0 0 0 9.69G resilvered c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c8t0d0s0 ONLINE 0 0 0 errors: No known data errors
So now that our new disk is up and running, do a quick reboot and check the new size of your zpool.
root@opensolaris:~# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT naspool 39.8G 39.1G 636M 98% ONLINE - rpool 7.94G 3.40G 4.53G 42% ONLINE -
Awesome. So let’s do the same for the other disks in the array:
root@opensolaris:~# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT naspool 39.8G 39.1G 636M 98% ONLINE - rpool 7.94G 3.40G 4.53G 42% ONLINE - root@opensolaris:~# zpool status naspool pool: naspool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM naspool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c9t0d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 errors: No known data errors root@opensolaris:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c8t0d0 /pci@0,0/pci15ad,1976@10/sd@0,0 1. c8t1d0 /pci@0,0/pci15ad,1976@10/sd@1,0 2. c8t2d0 /pci@0,0/pci15ad,1976@10/sd@2,0 3. c8t3d0 /pci@0,0/pci15ad,1976@10/sd@3,0 4. c8t5d0 /pci@0,0/pci15ad,1976@10/sd@5,0 5. c9t0d0 /pci@0,0/pci15ad,790@11/pci15ad,1976@3/sd@0,0 6. c9t1d0 /pci@0,0/pci15ad,790@11/pci15ad,1976@3/sd@1,0 7. c9t2d0 /pci@0,0/pci15ad,790@11/pci15ad,1976@3/sd@2,0 8. c9t3d0 /pci@0,0/pci15ad,790@11/pci15ad,1976@3/sd@3,0 Specify disk (enter its number): ^C root@opensolaris:~# zpool replace naspool c8t2d0 c9t1d0 root@opensolaris:~# zpool status naspool pool: naspool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 0.00% done, 0h0m to go config: NAME STATE READ WRITE CKSUM naspool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c9t0d0 ONLINE 0 0 0 replacing ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c9t1d0 ONLINE 0 0 0 32.5K resilvered c8t3d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 errors: No known data errors
And go make a coffee until it’s finished resilvering, then add the next disk:
root@opensolaris:~# zpool status naspool pool: naspool state: ONLINE scrub: resilver completed after 0h2m with 0 errors on Sun Oct 11 10:00:17 2009 config: NAME STATE READ WRITE CKSUM naspool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c9t0d0 ONLINE 0 0 0 c9t1d0 ONLINE 0 0 0 9.78G resilvered c8t3d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 errors: No known data errors root@opensolaris:~# zpool replace naspool c8t3d0 c9t2d0
And so on, until you’ve changed them all out, and then do a quick reboot to force the zpools to update to the new sizes.
root@opensolaris:~# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT naspool 200G 39.1G 161G 19% ONLINE - rpool 7.94G 3.43G 4.51G 43% ONLINE -
However, at the time of writing, there seems to be a little problem where when your zpool is resized, your zfs filesystem isn’t. So even though my zpool is currently showing 200G of space, my zfs filesystem is still its original size:
root@opensolaris:~# zfs list NAME USED AVAIL REFER MOUNTPOINT naspool 29.3G 118G 32.9K /naspool naspool/movies 29.3G 118G 29.3G /naspool/movies naspool/music 28.4K 118G 28.4K /naspool/music naspool/photos 28.4K 118G 28.4K /naspool/photos
I’m currently researching a solution to this problem which doesn’t involve creating a new filesystem and moving all the files over from the old one.
Update: It seems that once you’ve created a raidz filesystem, you cannot modify how large it is. But there are ways around it. Check this article for info.
Hello,
Thanks a lot for sharing this with us, this is really useful and saved me sometime which I wanted to do the same as you are doing and testing.
But could you please let me know about the last part problem? That the zfs didn’t really show the new size?
I’m researching for making my NAS computer…the thing I’m not sure about if I could use a combination of SATA and IDE drives of different sizes?
Thanks a lot 🙂
Hi Hasan,
I’ve just posted a new story about expanding an Opensolaris NAS here. In a nutshell, you’re best off using a flat zfs filesystem, then adding mirror disks for security as raidz cannot be expanded once it has been created.
Different sized IDE and SATA disks are fine, but getting mirroring happening could be problematic.
Perhaps use raidz for the time being, but then aim to add a couple of big disks in a mirror down the track to replace the odd sized disks?
Hi,
I can’t see any problem at all. According to your screenshots you expanded the pool by replacing 10GB drives against 50GB drives and got a raw pool-capacity of 200GB instead of 40GB before. In your raidz-condition this leads to 4-1 times single drive capacity for the zpool (150G). And your “zfs list” shows 118GB available space and 29.3GB used space, so everything is fine: the poolsize expanded and the zfs filesystem expanded as well.
For testing I put one 2GB and one 4 GB drive in a raidz1 (senseless for production, but for testing sufficient), filled up the pool, replaced the 2 GB drive against a 8GB drive, and after resilvering I got 4 GB usable space on the existing ZFS on the zpool instead of 2 GB before without reboot (tested via samba). So raidz1-expansion with replacing drives with bigger ones step by step works without any problem. This was tested under FreeBSD 8.2 with zpool version 15 and zfs version 4.
Zfs filesystems use what they can get from the underlying zpool. If this pool is expanded by adding a second mirror, replacing mirror disks against bigger ones or replacing raidz disks by bigger ones is not relevant.
Greets…
Hi Andiz,
Thanks for the info – it looks like they’ve updated ZFS (in FreeBSD at least, not sure about OpenSolaris). I’ll have to fire it up and have another look 🙂