mirror/swap0 DEGRADED, bug or hardware problem?

orjan-

Dabbler
Joined
Apr 17, 2018
Messages
20
I fished setup of my main server yesterday and did a new install of freenas-11.2-U7. This server has two pools:
ssd0:
a single samsung 970 evo plus with SED encryption enabled but auto locking disabled due to a bug in sedhelper not unlocking nvme drives. It has the default "MBRdone=N" and "MBRenabled=Y" set and this seems to block changes to the partitions on the drive.
ashift 13.

volume0
8x wd red 10TB in raidz2 with geli encryption.
Connected to onboard intel C246 sata controller.
Ashift 12.

I installed freenas, created pools on the web-gui and created dataset and ACL on the CLI. I created 4 jails on ssd0 that were running with almost no activity.
18:00 i started a 13TB SMB transfer over 1gbit to volume0.
23:50 i added 2 usb3.0 ntfs disks and started copying the contents on the CLI to volume0.(1TB and 5TB of data)

03:02 i get this email:
Checking status of gmirror(8) devices:
Name Status Components
mirror/swap0 DEGRADED ada7p1 (ACTIVE)
mirror/swap1 COMPLETE ada6p1 (ACTIVE)
ada5p1 (ACTIVE)
mirror/swap2 COMPLETE ada4p1 (ACTIVE)
ada3p1 (ACTIVE)
mirror/swap3 COMPLETE ada2p1 (ACTIVE)
ada1p1 (ACTIVE)
-- End of daily output --

When i logged into the web-gui to see what was going on, there was no error on the bell icon. However there was this error in console from 01:45 saying that the swap on the nvme ssd had problems and not ada7(sata wd red hdd) as stated in the system warning email.
This is all the info from console:
Nov 21 00:00:00 freenas syslog-ng[2944]: Configuration reload finished;
Nov 21 01:45:21 freenas nvme0: WRITE sqid:11 cid:89 nsid:1 lba:144 len:16
Nov 21 01:45:21 freenas nvme0: ACCESS DENIED (02/86) sqid:11 cid:89 cdw0:0
Nov 21 01:45:21 freenas GEOM_MIRROR: Request failed (error=5). nvd0p1[WRITE(offset=8192, length=8192)]
Nov 21 01:45:21 freenas GEOM_MIRROR: Device swap0: provider nvd0p1 disconnected.

So my questions are:
Do I have a problem with swap on ada7 or is freenas mail notification reporting nvme0 as ada7 due to a bug causing "if unknown, report highest adaX as problem source"

I'm new to freenas, freebsd and ZFS, but I have some general experience with linux and CLI. Is there anything I can check so see if there ever was a problem with ada7 or any of the drives in volume0(ada0-ada7)?

The MBRdone and MBRenabled variables(sedutil-cli) on SED drives are not very well documented in sedutil-cli documentation or freenas. They seem to block changes to a drive so that it can not be repartitioned or get a secure erased once changed to their normal values "MBRdone=N" and "MBRenabled=Y". They change to these values after the first power cycle after enabling SED. If freenas tried to do something partition related at 01:45, then that could be the answer for the error 01:45 as the "MBRdone=N" and "MBRenabled=Y" might have blocked a partition change from happening.

Any suggestions?
I have not rebooted or done anything else then logging into the web-gui after it happend. The 13TB SMB transfer still has 24 hours left, so I was hoping to avoid a reboot until it finishes.
 

orjan-

Dabbler
Joined
Apr 17, 2018
Messages
20
I have done some testing on my test server. My test server has a samsung sata 850evo that support the same SED encryption as my 970 evo plus nvme ssd(Opal2).
The mysterious not well documented MBRDone and MBREnabled is as far as i can understand there to protect a shadow partition that is created if the drive is a boot drive in order to unlock the drive and be able to boot from it. However any guide on using sedutil-cli as a non-boot drive also sets the same values for MBRDone and MBREnabled. The second usage of these are probably to protect the drive from tampering with partitons and secure erease. If they are really needed at their default values is not possible to understand with the current documentation on sedutil-cli.
I have tested the default values and they are identical on both my 850evo sata on the test server and my 970 evo nvme on main server.

Before enabling SED:
MBRDone = N, MBREnabled = N

After enabling SED:
MBRDone = Y, MBREnabled = Y

After enabling SED and performing the first power cycle(and for ever unless manually changed)

MBRDone = N, MBREnabled = Y

I'm interested in testing to see if the error on the nvme0 drive in my first post can be recreated if the settings are the same as it was when the error happend(MBRDone = N, MBREnabled = Y) and if they happen if i disable the partition locking by using the values that are default when SED is disabled (MBRDone = N, MBREnabled = N).

Is there a way i can cause the system to write to it's swap partition?
 

orjan-

Dabbler
Joined
Apr 17, 2018
Messages
20
I think the swap degrade happened because of the MBRDone and MBREnabled variables in sedutil-cli by default blocks changes to the partitions. Creating dataset, folders and files is OK with these variables at the default values, but creating/deleting partitions is not. I'm guessing that the write error i got that caused the swap to get degraded was due to some activity that is considered as a partition change and blocked by MBRDone and MBREnabled. 18-24 hours after the swap was degraded i got another access denied write error on the nvme ssd, i had not rebooted the system so the swap was already degraded and i got no new degraded warning. I then changed the variables to MBRDone = N, MBREnabled = N and rebooted the system, and i have had no problems after this. Been running 24/7 for about a week after i rebooted.

So apperently the MBRDone and MBREnabled need to be set to N(off) to work as a zfs pool. These options are the same for both sata and nvme, should be the same for SSD and HDD as well.

As for the email warning about ada7 had been degraded, i think that is a bug in freenas. The only places i checked was the console messages and the email about the swap degrade. The console showed a problem with the nvme0 swap, and no info about the ada7 that the email said was the problem. ada7 is also the last sata device i have(ada0-ada7) so i think the email warning script picks up the last ada device instead of the nvme device for some strange reason. Maybe because nvme devices both uses /dev/nvme0 and /dev/nvd0 as device names.
 
Top