SLOG SSD died

mrstevemosher

Dabbler
Joined
Dec 21, 2020
Messages
49
Hi teams,
We have a pool that had a failed slog disk on it and killed the whole pool off. The drive was partitioned and 1 of those was the slog for pool DISK2.
Im trying to pull that disk out of the system but the system will not boot. Its like its locked in some mount/unmount loop during boot.

The drive pool consists of 8 4Tb HGST drives RAIDz2 1TB Sabrent NvME for cache and a partitioned ssd 128Gb for slog.

See if I try this again.

Any chance of going in via single user mode to fix this? The server just wont boot.
 

mrstevemosher

Dabbler
Joined
Dec 21, 2020
Messages
49
Got it!

Managed to login via single user mode

zpool import DISK1 -f -m
zpool remove DISK1 log da31p1
zpool import DISK2 -f -m
zpool remove DISK2 log da31p2
zpool import DISK3 -f -m
zpool remove DISK3 log da31p3

reboot.
Success.

Lesson learned for me is not use a single SSD partitioned for use across POOLs as a SLOG device.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
partitioned ssd 128Gb for slog.
Why partitioned? What was the other partition for?
1TB Sabrent NvME for cache
You had a L2ARC drive and ...
a partitioned ssd 128Gb for slog.
A what kind of hardware was this that you were using for SLOG

Normally, SLOG does not need to be redundant because if the device fails, the ZIL just goes to the pool where it would be without SLOG. It makes me wonder why you have it partitioned and what else may have happened.

For reference: https://www.truenas.com/community/threads/terminology-and-abbreviations-primer.28174/
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The server just wont boot.
You can always disconnect all the drives, boot the server, then plug the drives back in (hot plug) then you can mount the pool using the options in the GUI for importing a pool.

The failure to boot and pool crashing are unusual and I still think we need to be looking more at the other use of that SSD because I doubt it was the SLOG function that caused the problem.

Here is a guide for telling us what we need to know to answer your questions better:
 

mrstevemosher

Dabbler
Joined
Dec 21, 2020
Messages
49
Why partitioned? What was the other partition for?

You had a L2ARC drive and ...

A what kind of hardware was this that you were using for SLOG

Normally, SLOG does not need to be redundant because if the device fails, the ZIL just goes to the pool where it would be without SLOG. It makes me wonder why you have it partitioned and what else may have happened.

For reference: https://www.truenas.com/community/threads/terminology-and-abbreviations-primer.28174/

The partitioned idea came from this post ...
https://clinta.github.io/freenas-multipurpose-ssd/
Only thing we were trying was to use the partitioning ideas from the above url was to use a single SSD as a SLOG for 2 POOLs.

The SLOG device we chose was a cheap 128GB Crucial BX product.

Only goal we were trying to get out of the SLOG partitioning idea was to use the disk in 2 POOLs.

I appreciate you folks replying. Thank you.
 

Herr_Merlin

Patron
Joined
Oct 25, 2019
Messages
200
Slog without a redundancy.
Slog totally not suited for the the purpose.
Slog without plp.
Slog with slow access times.
Slog with laughable low tbw..

So many mistake

There are great topics within this forum explaining slog. Check those please
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

mrstevemosher

Dabbler
Joined
Dec 21, 2020
Messages
49
Got it!

Managed to login via single user mode

zpool import DISK1 -f -m
zpool remove DISK1 log da31p1
zpool import DISK2 -f -m
zpool remove DISK2 log da31p2
zpool import DISK3 -f -m
zpool remove DISK3 log da31p3

reboot.
Success.

Lesson learned for me is not use a single SSD partitioned for use across POOLs as a SLOG device.



Were you able to restore access to the pool?

Sir, yes we were. I was able to run the above commands and the at least got rid of the failed disk. Still no idea why it took out the POOLs. We were able to add a Supermicro NvME card with 2 Sabrent 256GB NvME drives. We dont have anything smaller. We think this will work as a SLOG device for our iSCSI VMWare POOLs. Primary function is lots of VMWare. If it helps the specs are below.

Supermicro X9DRD-7LN4F-JBOD
128GB DDR3 ECC
2 E5-2643 CPUs
1 Intel X520-DA2 Ethernet
1 of 3 Supermicro CSE-836 w/BPN-SAS2-EL1 Cases holds the system board
Onboard LSI SAS controller
36 4.3 HGST drives
3 Supermicro AOC-SLG3-2M2
1 2TB NvME Samsung
1 512GB NvME Samsung
2 1TB NvME Sabrent
2 256GB NvME Sabrent
1 4.3 IBM SAS SSD
 

Herr_Merlin

Patron
Joined
Oct 25, 2019
Messages
200
Again wrong ssd for slog.. please please read those topics

Redundancy fixed this time all other issues persist.. not an option to go
 

Herr_Merlin

Patron
Joined
Oct 25, 2019
Messages
200
well the point is those SSDs you choose are not designed to withstand the writes needed for a slog.
additonlly they have a really high latency compared to optane or RMS-200/300 NVDIMM or ZeusRAM devices.
Additionly your pool would be damaged in a case of power outage etc., against which a slog should protect you, as those ssds don't have power loss protection.

Thus the reduncay won't help you and the use of a slog is pointless..
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Im sorry you dont seem happy with our decision here. I appreciate you replying.

Thank you

Hi Steve,

While @Herr_Merlin isn't exactly being the most gentle with his words, he's not wrong here. Both your previous and new SLOG devices are not fit for purpose, and I strongly suggest you change the solution especially given the rest of your setup.

We think this will work as a SLOG device for our iSCSI VMWare POOLs. Primary function is lots of VMWare.

If you were previously running "lots of VMware" on a BX100 SLOG, I can 99% guarantee that you aren't running proper sync writes right now, and your data is at risk. Please show the output of the console command zfs get sync for your pool. I expect everything will say "standard" which for iSCSI means "not being used" and your in-flight data is not touching that SLOG device.

Given the rest of your system, you need to either invest in a pair of Optane NVMe cards - suggested would be the DC P4801X M.2 card in 100G/200G capacity - or if this is a very heavy write workload, a pair of used Radiant Memory Systems RMS-200 NVRAM cards. Then you need to zfs set sync=always on your VMware datasets and zvols. You should expect a decrease in performance, as once that value is set, you will actually be performing synchronous (safe) writes to your datastore.

Since the intention here is "sharing an SLOG across pools" you will have to consider limiting the amount of outstanding dirty data as well if you use a smaller device like the RMS-200. By default each pool is trying to claim up to 4G, and the RMS-200 is only 8G total. You would have to cap this at roughly 2.5G per pool
 

mrstevemosher

Dabbler
Joined
Dec 21, 2020
Messages
49
Hi Steve,

While @Herr_Merlin isn't exactly being the most gentle with his words, he's not wrong here. Both your previous and new SLOG devices are not fit for purpose, and I strongly suggest you change the solution especially given the rest of your setup.



If you were previously running "lots of VMware" on a BX100 SLOG, I can 99% guarantee that you aren't running proper sync writes right now, and your data is at risk. Please show the output of the console command zfs get sync for your pool. I expect everything will say "standard" which for iSCSI means "not being used" and your in-flight data is not touching that SLOG device.

Given the rest of your system, you need to either invest in a pair of Optane NVMe cards - suggested would be the DC P4801X M.2 card in 100G/200G capacity - or if this is a very heavy write workload, a pair of used Radiant Memory Systems RMS-200 NVRAM cards. Then you need to zfs set sync=always on your VMware datasets and zvols. You should expect a decrease in performance, as once that value is set, you will actually be performing synchronous (safe) writes to your datastore.

Since the intention here is "sharing an SLOG across pools" you will have to consider limiting the amount of outstanding dirty data as well if you use a smaller device like the RMS-200. By default each pool is trying to claim up to 4G, and the RMS-200 is only 8G total. You would have to cap this at roughly 2.5G per pool


Hello and thank you for your input.

First intent of the use of the SLOG was completely misused and misunderstood. We are pretty stupid and we get that. Thank you. We are going to get a couple of the the Optane cards.
Did not understand until now about Power Loss Protection. This is exactly what we are after.

Side note:
We found on eBay a ZeusRAM DDR3 device also an 8G capacity. Thoughts?

Here is the current zfs get sync.

DISK2 sync always local DISK2/iscsi219disk3 sync always inherited from DISK2 DISK2/iscsi222disk2 sync always inherited from DISK2 DISK2/iscsi222disk3 sync always inherited from DISK2 DISK2/iscsi224disk1 sync always local DISK2/iscsi239disk1 sync always inherited from DISK2

Im sorry that my questions or posting seem to anger some. That is not my goal. I as my team are trying to learn what we can do to better our infrastructure. We are not at genius level with FreeNAS and TruNAS. I got one guy here who is at best an EMC storage guy from the late 90s and the technology has changed so much he couldn't keep up. We do read and read and in some cases our eyes just roll back in our heads. There is a ton of content here. Also the chest thumpers that "MY WAY OR NO WAY" just hurts my ability to learn. I didn't even make it 10 posts here and already got some 'angry' tag. I thank you folks for your replies because you took that time to respond and I thank you.

Again I am truly sorry to anger you folks. Not our goals here.
 

mrstevemosher

Dabbler
Joined
Dec 21, 2020
Messages
49
Found my answer to the ZeusRAM device.

Thank you.

 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hello and thank you for your input.

First intent of the use of the SLOG was completely misused and misunderstood. We are pretty stupid and we get that. Thank you. We are going to get a couple of the the Optane cards.
Did not understand until now about Power Loss Protection. This is exactly what we are after.

Side note:
We found on eBay a ZeusRAM DDR3 device also an 8G capacity. Thoughts?

You're not stupid, you're learning. These things take time, and you're on the right path now.

The Optane cards will do a good job. They are very good NAND devices, but ultimately still NAND which does wear out over time. (Although the Optane cards will take petabytes before running out of steam.)

The ZeusRAM is a good legacy SAS option with unlimited write endurance, but it will be slower due to the SAS overhead. If you can spare the pair of PCIe slots, the RMS-200 is a higher performance option as it supports the newer NVMe transport protocol. But if hot-swap is critical, then you're back to SAS.

Here is the current zfs get sync.

DISK2 sync always local DISK2/iscsi219disk3 sync always inherited from DISK2 DISK2/iscsi222disk2 sync always inherited from DISK2 DISK2/iscsi222disk3 sync always inherited from DISK2 DISK2/iscsi224disk1 sync always local DISK2/iscsi239disk1 sync always inherited from DISK2

This honestly surprises me. The BX100 is definitely not a fast SLOG device, and as such it's surprising to me that your VMware performance would be anything less than "appallingly slow" with sync writes forced on.

Im sorry that my questions or posting seem to anger some. That is not my goal. I as my team are trying to learn what we can do to better our infrastructure. We are not at genius level with FreeNAS and TruNAS. I got one guy here who is at best an EMC storage guy from the late 90s and the technology has changed so much he couldn't keep up. We do read and read and in some cases our eyes just roll back in our heads. There is a ton of content here. Also the chest thumpers that "MY WAY OR NO WAY" just hurts my ability to learn. I didn't even make it 10 posts here and already got some 'angry' tag. I thank you folks for your replies because you took that time to respond and I thank you.

Again I am truly sorry to anger you folks. Not our goals here.

I can't speak for others, but I'm not angry. I'm a bit concerned for the safety of your data and your solution and don't want to see you repeating the same mistakes as before, but not angry. There are few situations where it truly is "right way or regret it" and VMware tends to touch on a lot of those though (don't use LACP with iSCSI, do use MPIO, do use sparse ZVOLs with VMFS6 for space reclamation, don't overprovision your zpool until you're 100% sure about your compression, don't use RAIDZ-anything, do use a fast SLOG with sync=always if you care about data safety ... I could write a long post on this.)

Check this link from my signature as well regarding the SLOG devices and their write workload:

In short, there's 3 things that you need for a "great" SLOG:

1. Fast throughput
2. Long lifespan with regards to 100% write workloads
3. Fast cache flushes

 

Herr_Merlin

Patron
Joined
Oct 25, 2019
Messages
200
If you are using a single SAS HBA for all those disk think about adding some more and split the disks in-between.

As you are running an older setup, regarding slog:
a) Intel optane (those with 10+ pb endurance (tbw))either as card or as u.2 on a PCIe x4 adapter that's sometimes cheaper
B) rms-200/300
C) STEC ZeusRAM via SAS
D) STEC ZeusIOPS SSDs via SAS

All other drives aren't really an option.
Those drives where developed as cache drives for such purpose and not as storage drives.
I personally would go for either a) b) or c)

We are running ZeusIOPS as SLOG they are faster than 0815 consumer or non heavy write enterprise NVMe disks..
Those might be the cheapest option as you get get them used for about 60Euro for the 800GB version with less than 1 pb written... So more than 10 to go..
 
Last edited:

mrstevemosher

Dabbler
Joined
Dec 21, 2020
Messages
49
Thank you.

We are picking up a pair of ZeusRAM devices dumping the consumer NvME drives. Thank you @Herr_Merlin
Now we're wondering if we should push the consumer NvME drives out and use something like the ZeusIOPS 800s (or smaller) for cache? Thoughts?

The POOL is RAIDZ2 with 8 4.3TB HGST drives. 3 iSCSI mounted vmware datastores and 2 iSCSI mounted Windows drives and what appears to be a SMB Share that my team uses as an ISO dumping ground. 34 Win 10 Horizon clients and 8 Win 2016 Servers. Datastores are zvols and the SMB share is a dataset.

There are other POOLs built the same way for engineering and our research departments on the FreeNAS machine. I figure once the DISK2 POOL gets solved we'll cookie cut the other pools with the same concepts.

We are completely open to suggestions. We still feel that we are in over our heads here after this thread.

Appreciate you folks.
 

mrstevemosher

Dabbler
Joined
Dec 21, 2020
Messages
49
You're not stupid, you're learning. These things take time, and you're on the right path now.

The Optane cards will do a good job. They are very good NAND devices, but ultimately still NAND which does wear out over time. (Although the Optane cards will take petabytes before running out of steam.)

The ZeusRAM is a good legacy SAS option with unlimited write endurance, but it will be slower due to the SAS overhead. If you can spare the pair of PCIe slots, the RMS-200 is a higher performance option as it supports the newer NVMe transport protocol. But if hot-swap is critical, then you're back to SAS.



This honestly surprises me. The BX100 is definitely not a fast SLOG device, and as such it's surprising to me that your VMware performance would be anything less than "appallingly slow" with sync writes forced on.



I can't speak for others, but I'm not angry. I'm a bit concerned for the safety of your data and your solution and don't want to see you repeating the same mistakes as before, but not angry. There are few situations where it truly is "right way or regret it" and VMware tends to touch on a lot of those though (don't use LACP with iSCSI, do use MPIO, do use sparse ZVOLs with VMFS6 for space reclamation, don't overprovision your zpool until you're 100% sure about your compression, don't use RAIDZ-anything, do use a fast SLOG with sync=always if you care about data safety ... I could write a long post on this.)

Check this link from my signature as well regarding the SLOG devices and their write workload:






Thank you. A couple of you folks I need to follow around on the boards. We are not sure we are provisioning the datastores correctly. Some do this, some do that.

After having a read the link you posted we found that we do have a couple IBM 12G SAS SSDs here that were listed as heavy write drives.

Thanks again.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
Since FreeNAS is so flexible and use case is very important, it's great that you're getting good advice here. I have no experience with data stores, and little re: VM, so I'll leave it to others who have been down your path.

However, I'd like to add that ServeTheHome has a nice primer on SLOGs as well as a buyers guide. ZeusRAM certainly has a mythical place in our collective history, but there are other RAM based solutions like the RMS-200 or RMS-375 systems mentioned above that allow for more PLP-RAM to be shared than ZeusRAM can. However, you may lose a PCIe slot or two as the stuff is too fast to be used over SATA effectively.

The L2ARC drive is not as important - the data in it is a redundant copy of what is inside your pool. So if you lose it, the server can recreate it. As of TrueNAS 12, you can make this data persistent via a flag in the tunables (otherwise the cache has to get "hot" after every time you reboot since reboots will normally clear the cache).

There are also some helpful articles here to help with CIFS performance that may or may not help. Cyberjock put together a nice summary of how to speed up directory performance on SMB, for example.
 
Top