SMART model# mismatch?

Chris Moore · Sep 11, 2018

Ender117 said:
I doubt there are any tangible benefit for switching to H310 mini.

If you switch to the H310, you can use ZFS to mirror the boot drives. It is a good safeguard. I use a mirrored pair of spinning disks for the boot pool in the servers I run at home. It probably doesn't qualify as a big benefit, but it is something.

Ender117 said:
Thanks for the links I will give them a read. Just to confirm, do you recommend against "4x stripped mirror + 2x 10 disk raidz2" setup or just happened to not included it in the quote?

I think the higher vdev count of the second option (the one I included) will give you more IOPS / better performance. It is the way I would suggest going.

Ender117 said:
Now back to the original topic, do you think is it possible that these SUN firmware (assume they are) will give troubles with FreeNAS?

The only way you might have trouble with them is if they were formatted for a non standard sector size. NetApp is bad about doing that and it makes you have to use a special utility to reformat (low level) the drive to change the sector size. It is a terrible pain, but I don't think you have that problem with these. The SMART data that you shared is pretty standard for a SAS drive and I don't exactly like it because I feel like it could be more informative, like SATA drives, but it works. If you can, you might want to buy a couple more to have as spares. I try to keep at least two of each kind of drive I am using on the shelf as cold spares.

Ender117 said:
how do I test against it? Maybe I can do a pool performance test but am unsure what to watch for.

If there were something (like the sector size) that was out of spec with these, you wouldn't be able to create a pool and write data to the pool. If that is working, you should be good. The burn-in testing is about looking for mechanical defects that would cause an early failure. You want to know if the drive will fail while it is still (hopefully) in some kind of warranty. One of the servers at work, when we set it up in Jan 2017, had three drives out of 60 fail in the first six months. Now, 21 months in, no additional failures. By stressing the drives at the start, the hope is that any early failures will show themselves before data goes into the system. As for what to watch for,

solnet-array-test (for drive / array speed) non destructive test
https://forums.freenas.org/index.php?resources/solnet-array-test.1/

If you use the solnet-array-test, it will show you if any of the drives is running slower than the rest, which would drag down the performance of the entire pool.

Ender117 said:
Correct me if I am wrong but I believe ZFS can work on top of raid card presented LUN, you just loose SMART monitoring and other optimizations.

If it has two disks presented, it can use the parity to correct errors. If you present a hardware RAID set as a single disk, it doesn't have parity to correct errors, but it can still tell you that there was an error.

Ender117 said:
If I do 2 single drive raid0 and mirror them in FreeNAS, I suppose redundancy and checksuming/scrubbing/integrity should still work?

It should, and you might only be without the SMART data.

Ender117 said:
In the end I feel this is not much worse than dual USB boot stick setup.

I don't like the USB boot option. I use mechanical disks and keep an eye on the SMART data. SSD is also an option and very reliable.

Ender117 said:
With that said, I still have 16 DIMM slots, so I can add more RAM if situation calls for it.

RAM is an option. If you have a moment, you might want to look at this very informative video regarding L2ARC:
https://www.youtube.com/watch?v=oDbGj4YJXDw

Ender117 said:
However w/o official documentation and seen nobody ever tested this,

There are test results in the referenced thread regarding benchmarks of SLOG devices.

Ender117 said:
Besides, I found a good deal on a used P3700

I did too. Don't feel bad.

Ender117 said:
Yeah I am going to test it and record it, for science [read: fun]!

Thanks

Ender117 · Sep 11, 2018

Ericloewe said:
It's not that simple. For starters, SMART is essential to keeping things under control. There's also the problem of all the trickery, to put it charitably, that HW RAID controllers are known to do. You can end up with data corruption out of the blue because of that man in the middle screwing things up.

Can you be more specific on this? I just cannot think of how would a hardware raid LUN cause data corruption in a way a single spinning rust won't? I know you will loose SMART monitoring and warning but many raid card provide the same feature, at least partially

HoneyBadger · Sep 11, 2018

Ender117 said:
Correct me if I am wrong but I believe ZFS can work on top of raid card presented LUN, you just loose SMART monitoring and other optimizations. If I do 2 single drive raid0 and mirror them in FreeNAS, I suppose redundancy and checksuming/scrubbing/integrity should still work? In the end I feel this is not much worse than dual USB boot stick setup.

To be perfectly honest, I'd probably trust a ZFS mirror on top of two hardware RAID0 over a USB stick mirror. You could still have corruption on the RAID0 disks, but you'd at least have ZFS on top to warn you of it.

Ender117 said:
Sorry if I wasn't clear but I mentioned 1T as the upper limit. I for sure will start from a smaller SSDs and add more along the way, as they are still quite expensive in TB range:(. With that said, I still have 16 DIMM slots, so I can add more RAM if situation calls for it.

I understand. Expand your RAM and L2ARC in parallel, since although SSD is many times faster than platter drives, RAM is many times faster than SSD. ;)

I have been debating on P3700 vs 900p. I know that 900p is of even higher performance than p3700. What holds me back was its resistance to power losses. I know that 900p was initially listed by intel with PLP but now without, as well as it should persist data through power losses architecturally. However w/o official documentation and seen nobody ever tested this, I decided to error on the safe side because I have no idea how to test this either. Besides, I found a good deal on a used P3700;)

Intel's statement is that since the 3D XPoint NAND used in all Optane products is DRAMless, there is no "volatile cache" component, and that all writes to devices using it (including the super-cheap 16GB Optane Memory) should be considered "safe by default" similar to the old Pliant/Sandisk "direct to NAND" Lightning SSDs. For testing that - well, anyone want to pull the plug on an Optane-powered box and tell us?

But if you got a good deal on the P3700, there's no harm in that. The SLOG benchmark thread shows that it only falls behind at larger recordsizes, and if you're hosting VMs you'll probably be sticking to 16K records at maximum.

Yeah I am going to test it and record it, for science [read: fun]!

Woo, science!

Ericloewe · Sep 11, 2018

Ender117 said:
Can you be more specific on this?

Scenario one: RAID controller starts returning the wrong blocks. This will quickly ruin your day if it happens on several disks.
Scenario two: RAID controller starts pretending that a write was successful even though it was not. Why would they do that? Probably because they expect to just write it to a new disk "soon".

Really, if there's one thing RAID controllers are not known for, it's behaving sanely. Sure, disks could do all this stuff, but they're not as actively evil as RAID controllers.

Ender117 · Sep 11, 2018

Chris Moore said:
If you switch to the H310, you can use ZFS to mirror the boot drives. It is a good safeguard. I use a mirrored pair of spinning disks for the boot pool in the servers I run at home. It probably doesn't qualify as a big benefit, but it is something.

Yeah I recognize mirroring boot drive as a benefit. But you can also do it on H710p if you present ZFS 2 single drive raid0 LUN. Yes it's still short of SMART monitoring and other stuffs, but for the use case I would say benefit is intangible.

Chris Moore said:
If it has two disks presented, it can use the parity to correct errors. If you present a hardware RAID set as a single disk, it doesn't have parity to correct errors, but it can still tell you that there was an error.

It should, and you might only be without the SMART data.

I don't like the USB boot option. I use mechanical disks and keep an eye on the SMART data. SSD is also an option and very reliable.

Yeah that's what I thought of :) I know you can also do copies=2 on a single drive so that you can still correct errors on a single disk (what I did in my pfsense box). But for simplicity I would just do 2 single drive raid0 and mirror them in ZFS (for boot device). H710p will also monitor SMART data for me, not as efficient, but it would work.

Chris Moore said:
I think the higher vdev count of the second option (the one I included) will give you more IOPS / better performance. It is the way I would suggest going.

Got you. So it would be a balance between performance and space, but neither option is terrible? I might be putting some 15k/10k drives for stripped mirrors if I end up doing the first option.

Chris Moore said:
The only way you might have trouble with them is if they were formatted for a non standard sector size. NetApp is bad about doing that and it makes you have to use a special utility to reformat (low level) the drive to change the sector size. It is a terrible pain, but I don't think you have that problem with these. The SMART data that you shared is pretty standard for a SAS drive and I don't exactly like it because I feel like it could be more informative, like SATA drives, but it works. If you can, you might want to buy a couple more to have as spares. I try to keep at least two of each kind of drive I am using on the shelf as cold spares.

If there were something (like the sector size) that was out of spec with these, you wouldn't be able to create a pool and write data to the pool. If that is working, you should be good. The burn-in testing is about looking for mechanical defects that would cause an early failure. You want to know if the drive will fail while it is still (hopefully) in some kind of warranty. One of the servers at work, when we set it up in Jan 2017, had three drives out of 60 fail in the first six months. Now, 21 months in, no additional failures. By stressing the drives at the start, the hope is that any early failures will show themselves before data goes into the system. As for what to watch for,

solnet-array-test (for drive / array speed) non destructive test
https://forums.freenas.org/index.php?resources/solnet-array-test.1/

If you use the solnet-array-test, it will show you if any of the drives is running slower than the rest, which would drag down the performance of the entire pool.

That is good to know, will test as you suggested.

Chris Moore said:
RAM is an option. If you have a moment, you might want to look at this very informative video regarding L2ARC:
https://www.youtube.com/watch?v=oDbGj4YJXDw

your link goes to putting 900p as a SLOG. But my understanding is that whatever gets kicked out of ARC as RAM being filled up goes into L2ARC, also a index of L2ARC is stored in RAM as well, I think that's the 2 key points

Chris Moore said:
There are test results in the referenced thread regarding benchmarks of SLOG devices.

Sorry I mean intel does not officially document 900p has PLP and nobody has ever test its behavior on power losses, so I steered away from it. Just a bit of paranoid. If there are such tests I would love to see

Ender117 · Sep 11, 2018

HoneyBadger said:
Intel's statement is that since the 3D XPoint NAND used in all Optane products is DRAMless, there is no "volatile cache" component, and that all writes to devices using it (including the super-cheap 16GB Optane Memory) should be considered "safe by default" similar to the old Pliant/Sandisk "direct to NAND" Lightning SSDs. For testing that - well, anyone want to pull the plug on an Optane-powered box and tell us?

But if you got a good deal on the P3700, there's no harm in that. The SLOG benchmark thread shows that it only falls behind at larger recordsizes, and if you're hosting VMs you'll probably be sticking to 16K records at maximum.

Yeah I know architecturally it should preserve data through power loss, but P4800X is listed with PLP and 900p w/o. Well probably a marketing strategy and I fell for it, I chose to buy neither of them.

And for those who are interested, newegg now have 900p 280g for new at 260. That's what I would buy if I don't have the p3700

Ender117 · Sep 11, 2018

Ericloewe said:
Scenario one: RAID controller starts returning the wrong blocks. This will quickly ruin your day if it happens on several disks.
Scenario two: RAID controller starts pretending that a write was successful even though it was not. Why would they do that? Probably because they expect to just write it to a new disk "soon".

Really, if there's one thing RAID controllers are not known for, it's behaving sanely. Sure, disks could do all this stuff, but they're not as actively evil as RAID controllers.

I agree #1 is very horrible and #2 as well if there is no battery backed cache. But these are examples of poor design. Not super reliable but so does single disks

Ericloewe · Sep 11, 2018

Ender117 said:
But these are examples of poor design.

Yeah, and of HW RAID. That's one reason why ZFS was created.

Important Announcement for the TrueNAS Community.

SMART model# mismatch?

Chris Moore

Hall of Famer

Ender117

Patron

HoneyBadger

actually does care

Ericloewe

Server Wrangler

Ender117

Patron

Ender117

Patron

Ender117

Patron

Ericloewe

Server Wrangler

Similar threads