ibrennan
Dabbler
- Joined
- Mar 31, 2022
- Messages
- 12
Forgive me as I'm new to TrueNAS. I recently built a system with 14 regular SAS HDDs and a few m.2 NVMe's using a PCIe to m.2 adapter and bifurication, everything at first seems to work great, I can see all NVMe's (4 in one x16 slot and 3 in another) in TrueNAS. Version is TrueNAS-12.0-U8.
All of the NVMe's are WD_BLACK SN770, 4 2TB and 3 200gb. 1 of the 200gb is for booting the system.
When the NVMe's are under load it will eventually crash the entire system. I see the following in logs and on main console. It seems to happen to any of the NVMe's, I've seen nvme3,nvme4 and nvme5 so far. I tried setting "hw.nvme.use_nvd=0" in loader.conf but that doesn't seem to make any difference, however it gives a slightly different result in the logs. When the issue happens the system locks up completely, and you need to force reset to continue.
If someone can maybe point me in the right direction I would very much appreciate it. It's been years since I played with FreeBSD.
I did see this bug that seems very similar, but I assume the fix is already in the version I'm using? here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713 Can someone help confirm this?
This is what I see when "hw.nvme.use_nvd=0" is set:
All of the NVMe's are WD_BLACK SN770, 4 2TB and 3 200gb. 1 of the 200gb is for booting the system.
When the NVMe's are under load it will eventually crash the entire system. I see the following in logs and on main console. It seems to happen to any of the NVMe's, I've seen nvme3,nvme4 and nvme5 so far. I tried setting "hw.nvme.use_nvd=0" in loader.conf but that doesn't seem to make any difference, however it gives a slightly different result in the logs. When the issue happens the system locks up completely, and you need to force reset to continue.
If someone can maybe point me in the right direction I would very much appreciate it. It's been years since I played with FreeBSD.
I did see this bug that seems very similar, but I assume the fix is already in the version I'm using? here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713 Can someone help confirm this?
Code:
Mar 29 21:42:25 truenas nvme5: Resetting controller due to a timeout and possible hot unplug. Mar 29 21:42:25 truenas nvme5: resetting controller Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:12 cid:120 nsid:1 lba:1497544880 len:16 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:12 cid:120 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:12 cid:123 nsid:1 lba:198272936 len:16 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:12 cid:123 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:13 cid:121 nsid:1 lba:431014528 len:24 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:13 cid:121 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:15 cid:127 nsid:1 lba:864636432 len:8 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:15 cid:127 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:16 cid:126 nsid:1 lba:2445612184 len:8 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:16 cid:126 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:16 cid:120 nsid:1 lba:430503600 len:8 Mar 29 21:42:25 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:16 cid:120 cdw0:0 Mar 29 21:42:25 truenas nvme5: failing outstanding i/o Mar 29 21:42:25 truenas nvme5: READ sqid:18 cid:123 nsid:1 lba:1499051024 len:8 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:18 cid:123 cdw0:0 Mar 29 21:42:26 truenas nvme5: failing outstanding i/o Mar 29 21:42:26 truenas nvme5: WRITE sqid:18 cid:124 nsid:1 lba:1990077368 len:8 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:18 cid:124 cdw0:0 Mar 29 21:42:26 truenas nvme5: failing outstanding i/o Mar 29 21:42:26 truenas nvme5: READ sqid:19 cid:122 nsid:1 lba:1237765696 len:8 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:19 cid:122 cdw0:0 Mar 29 21:42:26 truenas nvme5: failing outstanding i/o Mar 29 21:42:26 truenas nvme5: READ sqid:19 cid:125 nsid:1 lba:180758264 len:16 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:19 cid:125 cdw0:0 Mar 29 21:42:26 truenas nvme5: failing outstanding i/o Mar 29 21:42:26 truenas nvme5: READ sqid:20 cid:121 nsid:1 lba:2445612192 len:8 Mar 29 21:42:26 truenas nvme5: ABORTED - BY REQUEST (00/07) sqid:20 cid:121 cdw0:0 Mar 29 21:42:26 truenas nvd5: detached
This is what I see when "hw.nvme.use_nvd=0" is set:
Code:
nvme3: Resetting controller due to a timeout and possible hot unplug. nvme3: resetting controller nvme3: failing outstanding i/o nvme3: READ sqid:7 cid:127 nsid:1 lba:419546528 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:7 cid:127 cdw0:0 nvme3: (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=1901c5a0 0 7 0 0 0 failing outstanding i/o (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted nvme3: READ sqid:11 cid:127 nsid:1 lba:782841288 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:11 cid:127 cdw0:0 nvme3: (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=2ea935c8 0 7 0 0 0 failing outstanding i/o nvme3: READ sqid:11 cid:123 nsid:1 lba:704576056 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:11 cid:123 cdw0:0 nvme3: failing outstanding i/o nvme3: WRITE sqid:12 cid:127 nsid:1 lba:1016402352 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:12 cid:127 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:12 cid:125 nsid:1 lba:1824854760 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:12 cid:125 cdw0:0 nvme3: failing outstanding i/o nvme3: (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted WRITE sqid:13 cid:124 nsid:1 lba:1008638008 len:64 nvme3: ABORTED - BY REQUEST (00/07) sqid:13 cid:124 cdw0:0 nvme3: failing outstanding i/o nvme3: WRITE sqid:13 cid:125 nsid:1 lba:1008638152 len:56 nvme3: ABORTED - BY REQUEST (00/07) sqid:13 cid:125 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:15 cid:127 nsid:1 lba:783188688 len:8 nvme3: (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=29fefa38 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): WRITE. NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0 cdw=3c9511b0 0 7 0 0 0 ABORTED - BY REQUEST (00/07) sqid:15 cid:127 cdw0:0 nvme3: failing outstanding i/o nvme3: WRITE sqid:15 cid:123 nsid:1 lba:1008553080 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:15 cid:123 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:16 cid:124 nsid:1 lba:147012776 len:8 nvme3: (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=6cc512e8 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): WRITE. NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0 cdw=3c1e9838 0 3f 0 0 0 ABORTED - BY REQUEST (00/07) sqid:16 cid:124 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:16 cid:127 nsid:1 lba:2881895592 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:16 cid:127 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:17 cid:127 nsid:1 lba:2574392744 len:16 nvme3: ABORTED - BY REQUEST (00/07) sqid:17 cid:127 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:18 cid:126 nsid:1 lba:155895056 len:8 nvme3: (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): WRITE. NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0 cdw=3c1e98c8 0 37 0 0 0 ABORTED - BY REQUEST (00/07) sqid:18 cid:126 cdw0:0 nvme3: failing outstanding i/o nvme3: READ sqid:19 cid:125 nsid:1 lba:151377120 len:8 nvme3: ABORTED - BY REQUEST (00/07) sqid:19 cid:125 cdw0:0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=2eae82d0 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): WRITE. NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0 cdw=3c1d4c78 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=8c33ca8 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=abc63ca8 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=99721da8 0 f 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=94ac510 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted (nda3:nvme3:0:0:1): READ. NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=905d4e0 0 7 0 0 0 (nda3:nvme3:0:0:1): CAM status: CCB request completed with an error (nda3:nvme3:0:0:1): Error 5, Retries exhausted nda3 at nvme3 bus 0 scbus13 target 0 lun 1 nda3: <WD_BLACK SN770 2TB 731030WD 21513C800057> s/n 21513C800057 detached xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file