In previous articles, I investigated using ZFS to build a home NAS, which is very simple and extremely effective.
But in this article, I discussed expanding that NAS by changing out disks one by one. While it’s easy to change the disks out and increase the size of your zpool, it doesn’t quite work because as your filesystem was created as raidz1, it will not increase in size.
That means that when you want to increase the size of your NAS down the track, you either have to copy all the files from your NAS on to an external device, then delete and recreate your zfs filesystem, or consider a different way of building the NAS in the first place.
So today I’m going to look at building a NAS with the same functionality as before, but with the ability for the disks to be easily changed out down the track to easily increase the size of your home storage system.
First thing I’ll need is a clean install of OpenSolaris. As before, I’ve done my install in VMWare so that hardware addition is quick and easy.
Now let’s say that I’m just setting this system up for the first time. I’ve got out and bought four 1Tb disks which I intend to connect to my NAS and serve out data to my house. I’ll create four 10Gb disks in VMWare to represent them.
Note that I’ve added all the disks to virtual SCSI controller 1, and that the system hard disk is on IDE. When we boot back in to Solaris and check the format tool, you’ll see that these locations are represented by the identifiers on each of the disks:
root@solaris1:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c7d0 /pci@0,0/pci-ide@7,1/ide@0/cmdk@0,0 1. c10t0d0 /pci@0,0/pci1000,30@11/sd@0,0 2. c10t1d0 /pci@0,0/pci1000,30@11/sd@1,0 3. c10t2d0 /pci@0,0/pci1000,30@11/sd@2,0 4. c10t3d0 /pci@0,0/pci1000,30@11/sd@3,0 Specify disk (enter its number): ^C
So the C7 at the start of the 8Gb disk is the IDE controller, and C10 is the designation for the SCSI controller which the 10Gb disks are running on, the T number after that is essentially the disk number.
Now we’ll go ahead and create our NAS. But instead of a raidz array as we’ve created before, I’m going to start with two disks like so:
root@solaris1:~# zpool create naspool c10t0d0 c10t1d0 root@solaris1:~# zpool status naspool pool: naspool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM naspool ONLINE 0 0 0 c10t0d0 ONLINE 0 0 0 c10t1d0 ONLINE 0 0 0 errors: No known data errors
I now have a 20Gb array of two disks at my disposal. But while I may have some storage space, there’s absolutely nothing to stop one of these disks dying and destroying my data. Now I’ll add the second disks one by one to mirror the two disks I’ve already attached.
root@solaris1:~# zpool attach naspool c10t0d0 c10t2d0 root@solaris1:~# zpool attach naspool c10t1d0 c10t3d0 root@solaris1:~# zpool status naspool pool: naspool state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Sat Feb 6 11:26:06 2010 config: NAME STATE READ WRITE CKSUM naspool ONLINE 0 0 0 mirror ONLINE 0 0 0 c10t0d0 ONLINE 0 0 0 c10t2d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c10t1d0 ONLINE 0 0 0 c10t3d0 ONLINE 0 0 0 42.5K resilvered errors: No known data errors
Now we have two 10Gb mirrors which are being concatenated into one 20Gb array. I’ll quickly create some filesystems on the array:
root@solaris1:~# zfs create -o casesensitivity=mixed naspool/music root@solaris1:~# zfs create -o casesensitivity=mixed naspool/photos root@solaris1:~# zfs create -o casesensitivity=mixed naspool/movies root@solaris1:~# zfs list NAME USED AVAIL REFER MOUNTPOINT naspool 168K 19.6G 22K /naspool naspool/movies 19K 19.6G 19K /naspool/movies naspool/music 19K 19.6G 19K /naspool/music naspool/photos 19K 19.6G 19K /naspool/photos
So now we have our NAS and it’s functioning properly. It is secure in that if a disk fails we can simply replace it. So what happens if we fill up our NAS, and decide that we want to make it larger? Let’s simulate filling the array:
root@solaris1:~# cd /naspool/movies/ root@solaris1:/naspool/movies# mkfile 19g 19Gb_File root@solaris1:/naspool/movies# zfs list NAME USED AVAIL REFER MOUNTPOINT naspool 19.0G 574M 23K /naspool naspool/movies 19.0G 574M 19.0G /naspool/movies naspool/music 19K 574M 19K /naspool/music naspool/photos 19K 574M 19K /naspool/photos
Oh dear. We’re quickly running out of space, and it’s obvious that we’ll need to go out and grab ourselves some new disks to increase the storage space on our server.
So let’s say that I head down to my local computer hardware store and buy two 2Tb disks. The first thing I need to do is to remove the mirroring from one of my sets of disks.
root@solaris1:~# zpool detach naspool c10t3d0 root@solaris1:~# zpool status naspool pool: naspool state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Sat Feb 6 11:26:06 2010 config: NAME STATE READ WRITE CKSUM naspool ONLINE 0 0 0 mirror ONLINE 0 0 0 c10t0d0 ONLINE 0 0 0 c10t2d0 ONLINE 0 0 0 c10t1d0 ONLINE 0 0 0 errors: No known data errors
Note that while the first two disks are still mirrored, c10t1d0 is now running on its own. I’ll now shut down my computer and replace c10t3d0 (the device I just detached from the array) with a new, bigger disk. I could just add all the disks to VMWare to start with, but I believe this better represents how things are in the real world – where we have a limited amount of SATA connections on our motherboards 😉
So, here’s the old disk, which I’ll remove:
And when I create the new one, I’ll specify its SCSI location so it’s in the same place as the old disk.
So it will appear in the same location as the disk I just removed, but will now be 20Gb instead of 10Gb. This simulates replacing a 1Tb disk with a 2Tb.
When I boot back into Solaris, it will appear in format with the same identifier as before, but will now be twice the size:
root@solaris1:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c7d0 cyl 4092 alt 2 hd 128 sec 32> /pci@0,0/pci-ide@7,1/ide@0/cmdk@0,0 1. c10t0d0 -VMware Virtual S-1.0-10.00GB> /pci@0,0/pci1000,30@11/sd@0,0 2. c10t1d0 -VMware Virtual S-1.0-10.00GB> /pci@0,0/pci1000,30@11/sd@1,0 3. c10t2d0 <VMware,-VMware Virtual S-1.0-10.00GB> /pci@0,0/pci1000,30@11/sd@2,0 4. c10t3d0 <DEFAULT cyl 2608 alt 2 hd 255 sec 63> /pci@0,0/pci1000,30@11/sd@3,0 Specify disk (enter its number): ^C
So now that we have our larger disk up and running, we’ll use it to replace c10t1d0. Doing so will cleanly increase the size of our NAS by 10Gb (1Tb in the real world).
Before the operation our zpool looks like this:
root@solaris1:~# zpool status naspool pool: naspool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM naspool ONLINE 0 0 0 mirror ONLINE 0 0 0 c10t0d0 ONLINE 0 0 0 c10t2d0 ONLINE 0 0 0 c10t1d0 ONLINE 0 0 0 errors: No known data errors
Then we replace the disk:
root@solaris1:~# zpool replace naspool c10t1d0 c10t3d0
And Solaris begins the replacement process.
root@solaris1:~# zpool status naspool pool: naspool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 1.95% done, 0h2m to go config: NAME STATE READ WRITE CKSUM naspool ONLINE 0 0 0 mirror ONLINE 0 0 0 c10t0d0 ONLINE 0 0 0 c10t2d0 ONLINE 0 0 0 replacing ONLINE 0 0 0 c10t1d0 ONLINE 0 0 0 c10t3d0 ONLINE 0 0 0 189M resilvered errors: No known data errors
After the resilvering completes, the c10t1d0 will be removed automatically, and c10t3d0 will take its place in the zpool.
root@solaris1:~# zpool status naspool pool: naspool state: ONLINE scrub: resilver completed after 0h2m with 0 errors on Sat Feb 6 12:07:00 2010 config: NAME STATE READ WRITE CKSUM naspool ONLINE 0 0 0 mirror ONLINE 0 0 0 c10t0d0 ONLINE 0 0 0 c10t2d0 ONLINE 0 0 0 c10t3d0 ONLINE 0 0 0 9.50G resilvered errors: No known data errors
Now that’s done, all we need to do is shut down our machine and change out the 10Gb disk we just replaced with the second 20Gb disk we created earlier. As it was c10t1d0, we know that it’s on what VMWare considers SCSI 1:1.
The old disk:
The new disk:
When we boot up again, the new disk will be ready to add to our zpool to act as a mirror for c10t3d0.
root@solaris1:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c7d0 <DEFAULT cyl 4092 alt 2 hd 128 sec 32> /pci@0,0/pci-ide@7,1/ide@0/cmdk@0,0 1. c10t0d0 <VMware,-VMware Virtual S-1.0-10.00GB> /pci@0,0/pci1000,30@11/sd@0,0 2. c10t1d0 <DEFAULT cyl 2608 alt 2 hd 255 sec 63> /pci@0,0/pci1000,30@11/sd@1,0 3. c10t2d0 <VMware,-VMware Virtual S-1.0-10.00GB> /pci@0,0/pci1000,30@11/sd@2,0 4. c10t3d0 <VMware,-VMware Virtual S-1.0-20.00GB> /pci@0,0/pci1000,30@11/sd@3,0 Specify disk (enter its number): ^C
And we’ll use it as the new mirror for c10t1d0.
root@solaris1:~# zpool attach naspool c10t3d0 c10t1d0 root@solaris1:~# zpool status naspool pool: naspool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 1.75% done, 0h2m to go config: NAME STATE READ WRITE CKSUM naspool ONLINE 0 0 0 mirror ONLINE 0 0 0 c10t0d0 ONLINE 0 0 0 c10t2d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c10t3d0 ONLINE 0 0 0 c10t1d0 ONLINE 0 0 0 170M resilvered errors: No known data errors
When it finishes resilvering, we’ll be back to a fully redundant state, and will have an extra 10Gb to play with on our zpool:
root@solaris1:~# zfs list NAME USED AVAIL REFER MOUNTPOINT naspool 19.0G 10.4G 23K /naspool naspool/movies 19.0G 10.4G 19.0G /naspool/movies naspool/music 19K 10.4G 19K /naspool/music naspool/photos 19K 10.4G 19K /naspool/photos
Of course, our data is still there – Safe and sound in its slightly larger home.
root@solaris1:~# ls -lh /naspool/movies/ total 20G -rw------T 1 root root 19G 2010-02-06 11:37 19Gb_File
If we wanted to, we could upgrade the first two disks in the same way – by removing the mirror, replacing the first disk, then adding a new mirror.
Conclusion.
So now that we’ve looked at the nuts and bolts of replacing disks in this sort of array, and the fact that the zfs filesystems expand correctly with this method, where does it leave us?
While raidz may seem like a great idea – striping and parity with a one disk redundancy – in practice it is better to use a flat zfs filesystem of two or more disks with extra disks for mirroring. The reasons for this are:
- It performs much better than a raidz array, purely because of the fact that the system doesn’t need to calculate and write parity bits all over the place whenever you’re moving data.
- It is much easier to upgrade or to replace a failed disk within the array. All you need to do is break the mirroring (if one of the disks had failed you would remove it first), then use the zpool replace command to add the new disk(s) to your array.
However, there is a drawback to this way of doing things. Because you need to have a mirror disk for each disk in your array, you will have slightly less storage at your disposal when compared with a raidz array. So four 1Tb disks would give roughly 2Tb of storage, as opposed to 3Tb with raidz.
But less storage is a small price to pay when it’s this easy to change out a failed disk or upgrade your array.
Hello,
I’ve been reading your blog for a while and I like it. Thanks for sharing your expertise 🙂
I have a question please. I’m new to the NAS thing and actually I’m studying the best way to build my NAS.
I’ve seen FreeNAS which I’m sure you know about. My question is, is it better to use FreeNAS or just install Open Solaris? … would I be able to use different types and sizes of Hard Disks?
I’m trying to do something close to the Drobo. How can I do that??
My question might be too broad, excuse me for that because I’m new to the storage techniques.
Thanks again 🙂
Well, I’m note sure I’d call it ‘expertise,’ but thanks for the compliment 🙂
Finding something which emulated the Drobo with common PC hardware which I had lying around was precisely what I was looking for when I started playing with OpenSolaris. I wanted the ability to mix different speed and size disks and store my data safely, as well as replacing disks which had failed or upgrading to larger disks on the fly.
While you can simulate certain aspects of the Drobo’s feature set with OpenSolaris, there are a few limitations at this stage.
For example, a raidz-1 array (single disk redundancy) cannot be expanded by adding larger disks. If you make the array 2Tb when you first create it, it shall always be 2Tb until it is destroyed. You can replace the disks with larger ones, and have a second filesystem if you wish, but the original filesystem will remain the same size.
If you built a simple dynamic striped array, you could have the flexibility of disk replacement and dynamic expansion, but you would have to sacrifice data security in order to achieve it.
As for FreeNAS, I’m not exactly sure what options it has which may bring it a little closer to Drobo’s feature set, but I do know that it offers a simplified version of the ZFS filesystem – with no desktop environment and a few other web-based administration features to boot.
I plan on looking into FreeNAS soon, and I might write an article or two.
Thanks,
Leigh.
You said:
“If you built a simple dynamic striped array, you could have the flexibility of disk replacement and dynamic expansion, but you would have to sacrifice data security in order to achieve it.”
Could you tell me how to build the array you are talking about? What security features I lose?
You mean I won’t have any redundancy ?