TrueNAS Core unable to recognize WD SN550 as NVME storage device.

selveskii

Cadet
Joined
Apr 1, 2021
Messages
1
I am virtualizing TrueNAS Core using Proxmox and tries to build a pool from an array of NVMe drives. I have seven drives in total, from three different brands. Two of them are WD Blue SN550. When building a new pool using the web interface, I found neither WD SN550 showed up as storage devices:

root@truenas[~]# geom disk list
Geom name: vtbd0
Providers:
1. Name: vtbd0
Mediasize: 34359738368 (32G)
Sectorsize: 512
Mode: r1w1e2
descr: (null)
ident: (null)
rotationrate: unknown
fwsectors: 63
fwheads: 16

Geom name: nvd0
Providers:
1. Name: nvd0
Mediasize: 1024209543168 (954G)
Sectorsize: 512
Mode: r0w0e0
descr: ZHITAI PC005 Active 1TB
lunid: a428b70016ba0042
ident: ZTA11T0JA2050000EE
rotationrate: 0
fwsectors: 0
fwheads: 0

Geom name: nvd1
Providers:
1. Name: nvd1
Mediasize: 1024209543168 (954G)
Sectorsize: 512
Mode: r0w0e0
descr: ZHITAI PC005 Active 1TB
lunid: a428b70015550042
ident: ZTA11T0JA205000123
rotationrate: 0
fwsectors: 0
fwheads: 0

Geom name: nvd2
Providers:
1. Name: nvd2
Mediasize: 1000204886016 (932G)
Sectorsize: 512
Mode: r0w0e0
descr: KINGSTON SA2000M81000G
lunid: 0026b76843b0e425
ident: 50026B76843B0E42
rotationrate: 0
fwsectors: 0
fwheads: 0

Geom name: nvd3
Providers:
1. Name: nvd3
Mediasize: 1000204886016 (932G)
Sectorsize: 512
Mode: r0w0e0
descr: KINGSTON SA2000M81000G
lunid: 0026b76843b07be5
ident: 50026B76843B07BE
rotationrate: 0
fwsectors: 0
fwheads: 0

Geom name: nvd4
Providers:
1. Name: nvd4
Mediasize: 1024209543168 (954G)
Sectorsize: 512
Mode: r0w0e0
descr: ZHITAI PC005 Active 1TB
lunid: a428b70016b90042
ident: ZTA11T0JA2050000ED
rotationrate: 0
fwsectors: 0
fwheads: 0

All seven devices are successfully passed inside the guest. Outputs from dmesg indicates that all of them are recognized as NVMe devices:

root@truenas[~]# dmesg | grep nvme
nvme0: <Generic NVMe Device> mem 0xfe800000-0xfe803fff,0xfe804000-0xfe8040ff irq 20 at device 0.0 on pci1
nvme1: <Generic NVMe Device> mem 0xfe600000-0xfe603fff irq 20 at device 0.0 on pci2
nvme2: <Generic NVMe Device> mem 0xfe400000-0xfe403fff irq 20 at device 0.0 on pci3
nvme3: <Generic NVMe Device> mem 0xfe200000-0xfe203fff irq 16 at device 0.0 on pci4
nvme4: <Generic NVMe Device> mem 0xfe000000-0xfe003fff irq 16 at device 0.0 on pci5
nvme4: <Generic NVMe Device> mem 0xfe000000-0xfe003fff irq 16 at device 0.0 on pci5
nvme5: <Generic NVMe Device> mem 0xfde00000-0xfde03fff,0xfde04000-0xfde040ff irq 16 at device 0.0 on pci6
nvme6: <Generic NVMe Device> mem 0xfdc00000-0xfdc03fff irq 16 at device 0.0 on pci7

When I tried to identify them, I got some errors:

root@truenas[~]# nvmecontrol identify nvme0
nvmecontrol: identify request returned error
root@truenas[~]# nvmecontrol identify nvme5
nvmecontrol: identify request returned error

I noticed that both and only the two faulty drives are having two memory address blocks. No idea why.

lspci from FreeBSD:

01:00.0 Class 0108: Device 15b7:5009 (rev 01)
02:00.0 Class 0108: Device 1e49:0001 (rev 03)
03:00.0 Class 0108: Device 1e49:0001 (rev 03)
04:00.0 Class 0108: Device 2646:2263 (rev 03)
05:00.0 Class 0108: Device 2646:2263 (rev 03)
06:00.0 Class 0108: Device 15b7:5009 (rev 01)
07:00.0 Class 0108: Device 1e49:0001 (rev 03)

15b7:5009, according to Google, is WD Blue SN550. They correspond to nvme0 and nvme5.

I am using Proxmox 6.3, which is their latest release, along with TrueNAS-12.0-U2.1. Both are installed freshly.

Hardware:
- CPU: AMD Threadripper 3960X
- Mobo: Gigabyte Aorus TRX40 Xtreme
- RAM: 128GB DDR4, out of which 32GB are dedicated to TrueNAS.
Notice that there is some timeout bug in Proxmox when dedicating too much RAM to a VM. I googled it, turned off memory balloon, and used huge page of size 2M.

On the host. I have 4 M.2 interfaces from the mobo, and additional 4 from a PCI-E x16 adapter card. Both SN550 are inside the adapter card. On the host Linux, I have the following lspci output:

01:00.0 Non-Volatile memory controller: Kingston Technologies Device 2263 (rev 03)
02:00.0 Non-Volatile memory controller: Kingston Technologies Device 2263 (rev 03)

21:00.0 Non-Volatile memory controller: Sandisk Corp Device 5009 (rev 01)
22:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
23:00.0 Non-Volatile memory controller: Device 1e49:0001 (rev 03)
24:00.0 Non-Volatile memory controller: Sandisk Corp Device 5009 (rev 01)

43:00.0 Non-Volatile memory controller: Device 1e49:0001 (rev 03)
44:00.0 Non-Volatile memory controller: Device 1e49:0001 (rev 03)

The middle four are from the PCI-E adapter card. 22 is for the host. 23, 43, 44 are from Zhitai and correctly identified. 1 and 2 are from Kingston and correctly identified. Since 23 is identified, I doubt the issue lies in the adapter card. Plus everything is virtualized anyway.

All the seven drives use the vfio-pci driver in Linux.

I have googled around and found no useful info on SN550's compatibility with FreeBSD. In fact, some people in this community mentioned this particular drive in their posts. Still, I wonder if there is something wrong with WD NVMe drives. Does anybody have any clue about that?

Many thanks!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
What happens when you do it on bare metal?

Proxmox is not known to be particularly compatible with FreeNAS, and isn't a recommended hypervisor. Features such as PCI passthru are new to it, and it is not clear how well it can do even basic passthru, much less complicated configurations like yours. You need to remember that you are placing an extremely complex house of cards, that is, ZFS and FreeBSD, on top of another extremely complex house of cards, a hypervisor host. Practical experience over the last decade suggests that this only works when ALL the details are correct and ALL the hardware is entirely bug-free and ALL the software is just so. Devices that don't work generally suggest that at least one area of the trifecta is failing.

It would be best to test this on bare metal first, and then see what changes when virtualized. You may wish to review

https://www.truenas.com/community/t...ide-to-not-completely-losing-your-data.12714/

this virtualization sticky which discusses how one might virtualize things safely, but is actually implicitly talking about how to validate a virtualized instance of FreeNAS.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The SN550s have been used by other forum members, and aren't known to have any specific issues.

But you've got a complex and uncommon setup - a Threadripper (AMD) setup, on ProxMox, with virtual TrueNAS.

Questions -

What model is the adaptor card, and is it using a PLX/PCIe switch?
If you switch the SN550s to a slot outside of the adaptor card, does the fault follow the M.2 device or the card itself?

Can you post the vm.config file for your TrueNAS machine (use the CODE tag option in the forums for formatting) and possibly lspci -tv from the host itself to see if there's issues with the PCI passthrough.
 
Top