SpaceBass
Cadet
- Joined
- Jul 4, 2023
- Messages
- 5
TL;DR - machine worked for a year, recently started dropping network. Logs dont seem to show anything. NIC is HP Qlogic NC523SFP.
hardware:
details
hey folks,
I'm new here but I've been running TrueNAS Core for some time (although I'm by no means an expert). The box in question has been running for a year with no real issues. In the last week, it keeps dropping off the network. I've checked the logs (debug, console, message, system, and middleware) and I dont see much with one exception:
I have a kernel ARP moved message - it's a strange one, the MAC address for one of the messages is _almost_ identical to my router's WAN NIC, but the last two bits are different. The other MAC address doesn't appear in any part in my ARP table. I doubt this is a relevant message, but you experts will know better than I :)
After the device loses network, I try and run to the console as fast as I can to check the logs. Again, not much.
if I try and take the interface down with ifconfig ql0 down I get a lot of driver errors including some suggesting the queues are full. That's the only reason I also mention the ARP message... might that be filling up the queues? ¯\_(ツ)_/¯. I've also tried reloading the kernel driver; no joy.
The NIC is connected to my switch via a DAC cable. I've also tried 4 different SFP modules with two different fiber patches... no change.
From there, my only solution is a full reboot. Sometimes I get 24 hours, but most of the time I get 3-5 minutes before the thing drops off again.
I had enough issues with the NC523SFP in pfSense to know that while it is supported in FreeBSD, it is a bit of a flakey card. I have no problem replacing it... but I want to also make sure that's what is causing my issue.
I've ordered an Intel X520 NIC but it's a week away. Also, I'd just like to learn how to troubleshoot this system better. So, what would you do, test, or check? Any advice on tunables or settings?
hardware:
- Hyve Zeus with some sort of supermicro board
- CPU - E3-1220
- RAM - 16 GB (checked with Memcheck, seems fine)
- System installed on a Samsung SSD
- 6 10tb spinning drives
- HP NC523SFP Qlogic NIC which uses the qlxgb kernel driver
details
hey folks,
I'm new here but I've been running TrueNAS Core for some time (although I'm by no means an expert). The box in question has been running for a year with no real issues. In the last week, it keeps dropping off the network. I've checked the logs (debug, console, message, system, and middleware) and I dont see much with one exception:
I have a kernel ARP moved message - it's a strange one, the MAC address for one of the messages is _almost_ identical to my router's WAN NIC, but the last two bits are different. The other MAC address doesn't appear in any part in my ARP table. I doubt this is a relevant message, but you experts will know better than I :)
After the device loses network, I try and run to the console as fast as I can to check the logs. Again, not much.
if I try and take the interface down with ifconfig ql0 down I get a lot of driver errors including some suggesting the queues are full. That's the only reason I also mention the ARP message... might that be filling up the queues? ¯\_(ツ)_/¯. I've also tried reloading the kernel driver; no joy.
The NIC is connected to my switch via a DAC cable. I've also tried 4 different SFP modules with two different fiber patches... no change.
From there, my only solution is a full reboot. Sometimes I get 24 hours, but most of the time I get 3-5 minutes before the thing drops off again.
I had enough issues with the NC523SFP in pfSense to know that while it is supported in FreeBSD, it is a bit of a flakey card. I have no problem replacing it... but I want to also make sure that's what is causing my issue.
I've ordered an Intel X520 NIC but it's a week away. Also, I'd just like to learn how to troubleshoot this system better. So, what would you do, test, or check? Any advice on tunables or settings?