New to FreeNAS - How to set it up on multipath disks?

Status
Not open for further replies.

mhampton

Dabbler
Joined
Dec 4, 2015
Messages
15
I have a spare Dell R720 to experiment with FreeNAS. My goal is to build a prototype to test FreeNAS' capabilities before purchasing from iX Systems.

My R720 has 2 x SAS HBAs. Each HBA has 2 ports for a total of 4 ports. The R720 also has 2 x 10GB network interface, and 4 x 1GB network interfaces.

The 4 SAS ports are connected to 2 Dell MD3220 arrays, each with 2 controllers, so that each of the disks are "visible" from the R720 via dual paths -- i.e. multipathing. Each disk is presented to the R720 as a seperate LUN.

FreeNAS 9.3 installs fine, however when I try to create a volume, it is treating both paths as if they were individual disks rather than multiple paths to a single disk.

How do I get FreeNAS to recognize that these are multipath disks?

Note the following: "View Disks" page doesn't show the multipath disks at all, but when I click "Volume Manager", I see all of the disks twice.
 
D

dlavigne

Guest
It is normal that in the multipath setup each path to the disk is visible as separate disk device. But the FreeNAS middleware should detect that and join them into multipath devices, and that is what should be visible in Volume Manager. Since this is not happening, are the SAS controllers HBAs or some form of RAID?
 

mhampton

Dabbler
Joined
Dec 4, 2015
Messages
15
I suspect you've got the PERC H810 for external SAS, which won't work as it's a hardware RAID controller and shouldn't be put anywhere near ZFS. Use the "SAS 6Gbps External HBA" optional cards instead.

No, it's not a PERC H810 -- If I had an MD1220 array, then I'd be using my H810, but that's not what I have installed in this server at the moment.

This is what I have: http://accessories.us.dell.com/sna/productdetail.aspx?c=us&l=en&s=dhs&cs=19&sku=342-0910

This SAS HBA uses this LSI chip: http://www.avagotech.com/products/server-storage/io-controllers/sas-2008
 
Last edited:

mhampton

Dabbler
Joined
Dec 4, 2015
Messages
15
It is normal that in the multipath setup each path to the disk is visible as separate disk device. But the FreeNAS middleware should detect that and join them into multipath devices, and that is what should be visible in Volume Manager. Since this is not happening, are the SAS controllers HBAs or some form of RAID?

The SAS HBAs are simple HBAs, not RAID controllers. The MD3220 controllers can aggregate the disks as RAID5 logical volumes, etc, however I instead setup each disk in the MD3220 as a seperate LUN (per the newbie advice from this forum).

The R720 sees both paths to all of the disks (48 disks with 96 paths).

Running camcontrol devlist shows all 96 paths.

Running sg_vpd -p di daXX shows 48 distinct UUIDs on 96 paths.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Ah, I think I understand it now. The MD3220's disk aggregation (RAIDing) is likely interfering with the ability of geom_multipath/BSD to automatically group the paths together.

If I had an MD1220 array, then I'd be using my H810

You actually don't want to do this. RAID cards and ZFS do not mix, for a variety of ugly reasons. Ideally you'd want to swap in the management modules from an MD1220 JBOD and attach them to your SAS HBAs.

See the Dell PDF on deploying Storage Spaces on MD12x0 series - the hardware requirements there are basically what you have to achieve for ZFS (no RAID components anywhere, pure JBOD all the way) - see "Configuration 2" in their docs for an example of a dual-HBA head with MPIO setup to daisy-chained MD1220s.
 

mhampton

Dabbler
Joined
Dec 4, 2015
Messages
15
Ah, I think I understand it now. The MD3220's disk aggregation (RAIDing) is likely interfering with the ability of geom_multipath/BSD to automatically group the paths together.



You actually don't want to do this. RAID cards and ZFS do not mix, for a variety of ugly reasons. Ideally you'd want to swap in the management modules from an MD1220 JBOD and attach them to your SAS HBAs.

See the Dell PDF on deploying Storage Spaces on MD12x0 series - the hardware requirements there are basically what you have to achieve for ZFS (no RAID components anywhere, pure JBOD all the way) - see "Configuration 2" in their docs for an example of a dual-HBA head with MPIO setup to daisy-chained MD1220s.

I understand that FreeNAS just wants JBOD, but that's not what I have. I have an MD3220. If there's a way to "pass through" the disks to the controller, I haven't found it. So instead I created 48 LUNS, 1 per disk, and put them on both SAS paths.

In any case, running camcontrol devlist shows all 96 paths, and running sg_vpd -p di daXX shows 48 distinct UUIDs on 96 paths, as expected. Everything appears to be correct.

One thing: digging some more, I see FreeNAS is giving me an alert which could be related to the SAS controllers: "Firmware version 7 does not match driver version 20 for /dev/mps0" (same with mps1)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I understand that FreeNAS just wants JBOD, but that's not what I have. I have an MD3220. If there's a way to "pass through" the disks to the controller, I haven't found it. So instead I created 48 LUNS, 1 per disk, and put them on both SAS paths.

Right, and I'm warning you that ZFS is a high-maintenance supermodel of a girlfriend that really, really wants things done her way. Other folks here may be "less diplomatic in their replies" regarding suggesting you don't let RAID get involved anywhere. You definitely won't see it anywhere in an iXsystems build. ;)

In any case, running camcontrol devlist shows all 96 paths, and running sg_vpd -p di daXX shows 48 distinct UUIDs on 96 paths, as expected. Everything appears to be correct.

Not sure why it wouldn't be automatically merging them into multipath devices then. What does gmultipath status show? Anything existing under /dev/multipath ?

One thing: digging some more, I see FreeNAS is giving me an alert which could be related to the SAS controllers: "Firmware version 7 does not match driver version 20 for /dev/mps0" (same with mps1)

This has to do with the firmware on your SAS cards. FreeNAS 9.3.1 expects phase 20 firmware on your cards and throws a warning if there's a mismatch. I don't think there's a P20 Dell firmware, so you'd need to flash those cards to the LSI/Avago IT firmware ... but that might make the MD3220 refuse to speak to them properly. Dunno.
 

mhampton

Dabbler
Joined
Dec 4, 2015
Messages
15
Not sure why it wouldn't be automatically merging them into multipath devices then. What does gmultipath status show? Anything existing under /dev/multipath ?

"gmultipath status" returns nothing. There is no /dev/multipath directory.

This has to do with the firmware on your SAS cards. FreeNAS 9.3.1 expects phase 20 firmware on your cards and throws a warning if there's a mismatch. I don't think there's a P20 Dell firmware, so you'd need to flash those cards to the LSI/Avago IT firmware ... but that might make the MD3220 refuse to speak to them properly. Dunno.

Yes, you're probably on to something with the firmware mismatch. I have the latest Dell firmware installed on the HBA, and that's version 7.15.08. I may try the LSI firmware if I can't get FreeNAS to properly recognize multipathing as-is.

One more thing: FreeNAS does recognize that the MD3220 disks are present. It can even create RAIDZ on them, and they work great as long as I pick the first path, and ignore the alternate path. I just can't get FreeNAS to do multipathing.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
"gmultipath status" returns nothing. There is no /dev/multipath directory.

You can try manually creating the multipath labels. If you think you know what two paths point to the same disk (I'm guessing you have da0 through da96, so try da0 and da48)

camcontrol inquiry da0 -S
camcontrol inquiry da48 -S

These should both yield the same serial number, if they do then

gmultipath label -v disk0 /dev/da0 /dev/da48

should create the device /dev/multipath/disk0 and hopefully that pops up in the volume manager.

Then you do some copy, paste, iterate, and repeat this for each pair of paths to a device.

Yes, you're probably on to something with the firmware mismatch. I have the latest Dell firmware installed on the HBA, and that's version 7.15.08. I may try the LSI firmware if I can't get FreeNAS to properly recognize multipathing as-is.

Probably a good idea although you may need to do some shenanigans similar to flashing the PERC H200 controllers in order to make it accept the LSI FW. Take a backup of your BIOS and record your SAS address first just in case you toast it during the process.

One more thing: FreeNAS does recognize that the MD3220 disks are present. It can even create RAIDZ on them, and they work great as long as I pick the first path, and ignore the alternate path. I just can't get FreeNAS to do multipathing.

I wonder if the MD3220 is presenting its own virtual-disk UUID for each path, or if it's trying to do active-active and FreeNAS is running active-passive. I recall an earlier post this year from someone who discovered that the "check for multipath" routine was basically one line that said "return False"

Code:
def _multipath_is_active(self, name, geom):
return False


Can you make the MD3220 switch to active-passive?
 
Last edited:

mhampton

Dabbler
Joined
Dec 4, 2015
Messages
15
You can try manually creating the multipath labels. If you think you know what two paths point to the same disk (I'm guessing you have da0 through da96, so try da0 and da48)

camcontrol inquiry da0 -S
camcontrol inquiry da48 -S

These should both yield the same serial number, if they do then

gmultipath label -v disk0 /dev/da0 /dev/da48

should create the device /dev/multipath/disk0 and hopefully that pops up in the volume manager.

Then you do some copy, paste, iterate, and repeat this for each pair of paths to a device.

Ok that worked. I now have 48 multipath devices.

After a reboot I now have the "View Multipaths" tab on the FreeNAS Storage page for the first time, and Volume Manager sees the new multipath devices.

Probably a good idea although you may need to do some shenanigans similar to flashing the PERC H200 controllers in order to make it accept the LSI FW. Take a backup of your BIOS and record your SAS address first just in case you toast it during the process.

I wonder if the MD3220 is presenting its own virtual-disk UUID for each path, or if it's trying to do active-active and FreeNAS is running active-passive. I recall an earlier post this year from someone who discovered that the "check for multipath" routine was basically one line that said "return False"

Code:
def _multipath_is_active(self, name, geom):
return False


Can you make the MD3220 switch to active-passive?

The disks are, indeed, active/passive. The MD3220 was complaining that the FreeNAS was not using the preferred path, so I used "gmultipath prefer" to make the OS match the MD3220's preferred paths. I see no way to change the MD3220 to do active/active -- at least not when the disks are setup to simulate JBOD.

I think at this point installing the LSI P20 firmware on PERC SAS HBA card would be going too far down the rabbit hole, so I'm going to continue with this this configuration and see how far I get.

The next challenge is to make a big volume out of all of these disks. From what I have read, it is apparently not a good idea to make a single big raidz volume out of 48 disks, so it seems I need to make smaller raidz "groups" and somehow marry them together.

This post seems to show the way, although it's a few years old: https://forums.freenas.org/index.php?threads/getting-the-most-out-of-zfs-pools.16
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
No, do NOT make a single VDEV out of 48 disks! The outcome won't be pretty. Read more about ZFS and the levels of redundancy... cyberjock has a great PPT presentation that breaks this stuff down very simply.

Assuming your goal is lots of storage, not a VM datastore, 8 6-disk RAIDz2 vdevs seem reasonable, and are OCD-approved. If you're planning this for a VM datastore, then striped mirrors (2-way or 3-way, depending on your risk tolerance) would be the way to go.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Assuming your goal is lots of storage, not a VM datastore, 8 6-disk RAIDz2 vdevs seem reasonable, and are OCD-approved. If you're planning this for a VM datastore, then striped mirrors (2-way or 3-way, depending on your risk tolerance) would be the way to go.

/official Badger Claw of Approval

Use case will determine your desired pool layout; unless it's highly sequential in nature, I'd lean towards the mirrors just from an IOPS perspective.

I'd also like to see the specs on the R720 as far as CPU/RAM/availability of SSD.
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
Oh, and if this is for a VM datastore, don't forget the need for SLOG.

You should also confirm that FreeNAS is able to see the drives' SMART data (smartctl -a /dev/...) or that you are monitoring the disks upstream in the MD3220 with some mechanism for alerting. Otherwise, someday down the road, you'll have a very bad day when you discover that drives have silently failed without your knowledge, and your whole pool is gone...
 

mhampton

Dabbler
Joined
Dec 4, 2015
Messages
15
No, do NOT make a single VDEV out of 48 disks! The outcome won't be pretty. Read more about ZFS and the levels of redundancy... cyberjock has a great PPT presentation that breaks this stuff down very simply.

Assuming your goal is lots of storage, not a VM datastore, 8 6-disk RAIDz2 vdevs seem reasonable, and are OCD-approved. If you're planning this for a VM datastore, then striped mirrors (2-way or 3-way, depending on your risk tolerance) would be the way to go.

Thanks for the advice. I had 60 600GB SAS disks to create a pool from (2 x MD3220s + internal disks on the R720), so I ended up making six 9-disk RAIDZ1 vdevs in one zpool with 2 spares and put the log on 2 mirrored 200GB SLC SSDs. That left 4 x 600GB SAS disks unused which I will eventually use with another spare MD3220 I have sitting around.

So far so good. I am using this for both backups and a vmdatastore. So far I don't see a performance penalty on the VMs I have migrated. Performance is surprisingly on-par with our overpriced NetApp. In some cases this FreeNAS configuration is faster.

Looks like it will take around 6 hours to copy 2.8T worth of backups from the NetApp to the FreeNAS box over NFS. I think the bottleneck is the NetApp box, though.

The R720 has 32GB of memory and 1 x 4-core CPU. It's definitely NOT CPU bound with this workload.

When I hammer a VM, the SSDs get very busy with ~2000 iop/s and 150MB/s. On the VM I made a 25GB file of random bytes (using /dev/urandom), copying that file around I was getting around 90MB/s throughput on the VM. Not stellar, but adequate.

The next test is to see how FreeNAS performs as an NFS server for hundreds of millions of small files. The NetApp sucked for that use-case.

This setup has me wondering how iX systems bakes controller redundancy into their configurations. I haven't found an option for "clustering" two FreeNAS servers together on shared disks.
 
Last edited:

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
Good news on the SLOG. I'm concerned about the array configuration, however. Realize that the loss of any 2 drives in any of your vdevs will kill your array. I'm assuming those drives were all bought at the same time... it's certainly not unheard of to lose more than one disk at the same time (you've probably seen this, if you have big NetApps). From the time you lose the first drive, you have to know the drive has failed, swap it, and complete the resilvering process to restore redundancy... a second failure during that period and you're SOL. The spares help - at least the rebuild should start automatically, but rebuilding won't be terribly fast.

The IOPS available on the array will be the sum of the IOPS of the slowest disk in each vdev. Since your drives are homogeneous, you're looking at roughly 150*6=900 IOPS. I could crush that with a single busy Splunk indexer or SQL server. If you're running a ton of very low IO VMs, perhaps you'll be OK... until something weird happens, like the systems get hit by SCCM/WSUS simultaneously and start a patch cycle, consuming way more IOPS than they do normally. In contrast, going to 2-way mirrors would get you 30*150=4,500 IOPS... a much more respectable number. I suspect you'll also hit this wall if you start working with millions of small files... that workload pretty much sucks, regardless of the number of spindles you throw at it. You'd be better off building an array of SSDs and putting your small files there, if you want to change lots of them very quickly. Perhaps also consider creating multiple pools... keep the small files off on their own pool, so when you hit them hard, you don't make the entire pool slow.

From first-hand experience with a smaller system previously, you'll experience everything trucking along nicely... until you'll either move a few beefier VMs over or something like a patch cycle starts... then it will get painful. You may even have VMs go catatonic, or outright die... VSphere can get really pissy when the underlying storage gets ridiculously slow.

Don't forget the 50% utilization rule for VM datastores...
 

mhampton

Dabbler
Joined
Dec 4, 2015
Messages
15
Good news on the SLOG. I'm concerned about the array configuration, however. Realize that the loss of any 2 drives in any of your vdevs will kill your array. I'm assuming those drives were all bought at the same time... it's certainly not unheard of to lose more than one disk at the same time (you've probably seen this, if you have big NetApps). From the time you lose the first drive, you have to know the drive has failed, swap it, and complete the resilvering process to restore redundancy... a second failure during that period and you're SOL. The spares help - at least the rebuild should start automatically, but rebuilding won't be terribly fast.

The IOPS available on the array will be the sum of the IOPS of the slowest disk in each vdev. Since your drives are homogeneous, you're looking at roughly 150*6=900 IOPS. I could crush that with a single busy Splunk indexer or SQL server. If you're running a ton of very low IO VMs, perhaps you'll be OK... until something weird happens, like the systems get hit by SCCM/WSUS simultaneously and start a patch cycle, consuming way more IOPS than they do normally. In contrast, going to 2-way mirrors would get you 30*150=4,500 IOPS... a much more respectable number. I suspect you'll also hit this wall if you start working with millions of small files... that workload pretty much sucks, regardless of the number of spindles you throw at it. You'd be better off building an array of SSDs and putting your small files there, if you want to change lots of them very quickly. Perhaps also consider creating multiple pools... keep the small files off on their own pool, so when you hit them hard, you don't make the entire pool slow.

From first-hand experience with a smaller system previously, you'll experience everything trucking along nicely... until you'll either move a few beefier VMs over or something like a patch cycle starts... then it will get painful. You may even have VMs go catatonic, or outright die... VSphere can get really pissy when the underlying storage gets ridiculously slow.

Don't forget the 50% utilization rule for VM datastores...

Wouldn't the spares I set aside in the zpool kick in if a disk failed? That was my intention. Maybe I'm misunderstanding how spares work in a ZFS pool.

The VMs mostly just sit idle, occasionally picking up a task. When that happens, they are CPU bound -- very little I/O on those VMs.

We'll see how it goes with the small files. With the NetApp we had to distribute the files into 16 separate NFS mountpoints to get the required throughput - for some reason a single NFS mountpoint turned out to be a significant bottleneck. Not sure if the NFS protocol was the culprit or something else. Our NetApp is running in 7-mode, so pNFS wasn't an option.
 
Last edited:

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
They do - but the spares have to be built (resilvered) to restore redundancy. A second drive failure during this period (which will cause lots of disk IO... which could push an older drive over the edge...) will nuke your pool.
 

mhampton

Dabbler
Joined
Dec 4, 2015
Messages
15
They do - but the spares have to be built (resilvered) to restore redundancy. A second drive failure during this period (which will cause lots of disk IO... which could push an older drive over the edge...) will nuke your pool.

Ah, that makes sense. Fortunately this FreeNAS server is just going to serve as a backup target and a VM datastore for VMs that can be rebuilt from scratch anytime, and persist no data. I'll be building a second FreeNAS server shortly, and this time I'll try either RAID10 or RAIDZ2 although I'm hesitant to take the associated performance penalty with the latter.
 
Status
Not open for further replies.
Top