SubnetMask
Contributor
- Joined
- Jul 27, 2017
- Messages
- 129
Something else to consider - While it's essentially never that I do huge numbers of storage vMotions, that was the one way that I found I was able to duplicate the 'crash' I was getting when trying to suspend and shut down a bunch of machines in a hurry due to a power failure where I want to try and get everything down gracefully and maintain the running state of a handful of VMs. That being said - the storage vMotions are NOT (or at least SHOULD not be) related to iSCSI. As FreeNAS/TrueNAS (supposedly) supports VAAI, that function should not be passing over iSCSI - VMWare should be sending a command that essentially says 'move this machine from this datastore to that datastore and report back when done' and the storage does that (Kind of over simplified description).
Now, that all being said, to comment WI_Hedgehog posted, these drives are all SAS (Not SATA) drives and the ones in my main FreeNAS that are part of the striped mirror that my VMs boot off originally came out of IBM/Lenovo servers if memory serves (I'm not pulling one to check lol). The larger disks that make up my RAIDz3 pool are HGST Helium SAS disks. Along with Dell hardware through and through, so it's not garden variety cheap consumer nonsense - the least enterprise components in the entire chain, and saying they're least enterprise is debatable, would be the supermicro enclosures.
One of my main complaints about FreeNAS/TrueNAS, or maybe it's just ZFS in general, is that absolutely ridiculous waste of disk space - to say that one should only be using 10-20% of the disk space is utterly absurd. While I get that the nature of the filesystem can leave holes and such, making it harder and harder to write data properly when it starts to fill up, I'm rather surprised there isn't some sort of background defrag patrol built in that happens in times of lower usage to streamline the disk usage and preserve or open up large swaths of free sectors.
It's also crazy to suggest that a volume becoming inaccessible due to hitting it hard is 'normal'. Individual things slowing down or the overall system plateauing, sure, absolutely understandable and, honestly, expected. But no matter how hard you hit it, it should never go offline or become inaccessible like this. I've dealt with quite a few storage systems - from IBM/Lenovo, HP and Dell RAID controllers with internal arrays, to EqualLogic, Compellent, PowerVault, Promise, and I think there may be one or two I can't recall, and I've NEVER had one that just pushed a volume offline like this due to being hammered - if IXSystems is selling TrueNAS as an 'Enterprise Product' (Which they are), this CAN'T be right. Like I said, I can understand performance tanking or plateauing when you start really thrashing the storage, but it should NEVER go offline like this. To be fair, I haven't tried TrueNAS yet for my testing to see if maybe it doesn't exist there these days, but that goes back to the block size and VMFS issues from my other thread that I got no input on.
I should also note that while I haven't gone crazy testing on my main FreeNAS because, well, I really don't want to crash it or push any of its volumes offline, I have moved multiple VMs simultaneously to my RAIDz3 volume and it hasn't blown up (only powered off test VMs are on my z3 volume - its purpose in life is two large VMDKs that contain my media library and general data volumes)- it SEEMS to only be striped mirrors. Maybe I'll have to try that on the test machine.
One other funny side note, to say that RAID10/Striped Mirrors etc are the king of all configurations that will always outperform RAID5/6 RAIDz is not universally true. Years ago, I was using a Promise vTrak (which I'd actually still be using now if it wasn't for the fact that it only supported up to 4TB disks, and only if they were VERY SPECIFIC SAS disks), and was going to use a RAID10 setup for my VMs as 'conventional wisdom' always said RAID10 is the fastest. Well, it wasn't. Not even close. My benchmarks on the RAID10 setup SUCKED. So I reached out to Promise, and they told me no, it's optimized for RAID5/6, so I created RAID5 volumes to test with and that absolutely blew the doors off the RAID10 setup. With the MASSIVE amount of processing power and memory available to FreeNAS/TrueNAS compared to that Promise array, I'm rather surprised that RAIDz provides 'no performance benefit beyond one disk worth of performance'.
I guess the TLDR is if the 'issue' was "I'm trying to copy/move a lot of data and it's going real slow", and the evidence showed that the disks were being hammered, causing the performance to slow down (Let's say one vMotion runs at 150MB/s, but then when three are run at the same time, they each only get 50MB/s), saying 'you need more disks to increase performance' would be totally acceptable. But hammering the disks shouldn't cause the volumes to become inaccessible.
Now, that all being said, to comment WI_Hedgehog posted, these drives are all SAS (Not SATA) drives and the ones in my main FreeNAS that are part of the striped mirror that my VMs boot off originally came out of IBM/Lenovo servers if memory serves (I'm not pulling one to check lol). The larger disks that make up my RAIDz3 pool are HGST Helium SAS disks. Along with Dell hardware through and through, so it's not garden variety cheap consumer nonsense - the least enterprise components in the entire chain, and saying they're least enterprise is debatable, would be the supermicro enclosures.
One of my main complaints about FreeNAS/TrueNAS, or maybe it's just ZFS in general, is that absolutely ridiculous waste of disk space - to say that one should only be using 10-20% of the disk space is utterly absurd. While I get that the nature of the filesystem can leave holes and such, making it harder and harder to write data properly when it starts to fill up, I'm rather surprised there isn't some sort of background defrag patrol built in that happens in times of lower usage to streamline the disk usage and preserve or open up large swaths of free sectors.
It's also crazy to suggest that a volume becoming inaccessible due to hitting it hard is 'normal'. Individual things slowing down or the overall system plateauing, sure, absolutely understandable and, honestly, expected. But no matter how hard you hit it, it should never go offline or become inaccessible like this. I've dealt with quite a few storage systems - from IBM/Lenovo, HP and Dell RAID controllers with internal arrays, to EqualLogic, Compellent, PowerVault, Promise, and I think there may be one or two I can't recall, and I've NEVER had one that just pushed a volume offline like this due to being hammered - if IXSystems is selling TrueNAS as an 'Enterprise Product' (Which they are), this CAN'T be right. Like I said, I can understand performance tanking or plateauing when you start really thrashing the storage, but it should NEVER go offline like this. To be fair, I haven't tried TrueNAS yet for my testing to see if maybe it doesn't exist there these days, but that goes back to the block size and VMFS issues from my other thread that I got no input on.
I should also note that while I haven't gone crazy testing on my main FreeNAS because, well, I really don't want to crash it or push any of its volumes offline, I have moved multiple VMs simultaneously to my RAIDz3 volume and it hasn't blown up (only powered off test VMs are on my z3 volume - its purpose in life is two large VMDKs that contain my media library and general data volumes)- it SEEMS to only be striped mirrors. Maybe I'll have to try that on the test machine.
One other funny side note, to say that RAID10/Striped Mirrors etc are the king of all configurations that will always outperform RAID5/6 RAIDz is not universally true. Years ago, I was using a Promise vTrak (which I'd actually still be using now if it wasn't for the fact that it only supported up to 4TB disks, and only if they were VERY SPECIFIC SAS disks), and was going to use a RAID10 setup for my VMs as 'conventional wisdom' always said RAID10 is the fastest. Well, it wasn't. Not even close. My benchmarks on the RAID10 setup SUCKED. So I reached out to Promise, and they told me no, it's optimized for RAID5/6, so I created RAID5 volumes to test with and that absolutely blew the doors off the RAID10 setup. With the MASSIVE amount of processing power and memory available to FreeNAS/TrueNAS compared to that Promise array, I'm rather surprised that RAIDz provides 'no performance benefit beyond one disk worth of performance'.
I guess the TLDR is if the 'issue' was "I'm trying to copy/move a lot of data and it's going real slow", and the evidence showed that the disks were being hammered, causing the performance to slow down (Let's say one vMotion runs at 150MB/s, but then when three are run at the same time, they each only get 50MB/s), saying 'you need more disks to increase performance' would be totally acceptable. But hammering the disks shouldn't cause the volumes to become inaccessible.