GUI, S3 and SMB Shares are not accessible

boostedn

Dabbler
Joined
Mar 9, 2023
Messages
14
Hi All,

Hardware in my signature.

The system will run perfectly fine for days or sometimes as little as an hour then all of a sudden I am not able to connect to the Web GUI, my SMB Shares and my S3 storage. The fix I found when this happens is to unplug the ethernet cable and plug it back in. With a monitor hooked up to my TrueNAS machine, I see it shows the connection again and everything works until the next time it happens.

On this board there are 4 Intel i226-V 2.5G NICs. I tried every one of them, and I have the same issue crop up. I then decided to put Windows 11 on this machine for 2 weeks, it never dropped off once.

When the issue occurs with TrueNAS, the machine is still running correctly and responding to pings. There is nothing else using this IP on my network (192.168.1.186). I've swapped out the network cable, network switch and even physically moved the machine to another part of my home with a separate network drop that I know works to try and rule out a network issue. The issue follows the machine itself.


Anyone stumble upon this? I found other threads where the Realtek NICs were blamed, however, I'm using Intel NICs. I also have limited the connection speed to 1Gbit on my switch to see if that would help, but the issue crops up no matter if it's 2.5G or 1G.

In the mean time I've set a cron job to reboot the machine every evening to see if I can avoid noticing the issue lol... not a great workaround, I know.



Thanks!


Scott
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
This sounds like the PHYs are overheating, which would make sense with this passively-cooled platform.
 

boostedn

Dabbler
Joined
Mar 9, 2023
Messages
14
Thanks Samuel. I have this in a large case with ample cooling. Obviously though that is a moot point if the NICs themselves are overheating. Hmm... I have some noctua fans that I can mount near the NICs to see if it makes a difference. I'll give that a shot. I have the same board in a case (one of those cases/boards mean for pfsense, etc), and that doesn't overheat with a bunch of docker containers, etc running on it.
I'll give it a shot and report back anyway.


Thanks for the idea, I never thought of that being a possibility.


Here's a pic of the current setup.

tn.png
 

boostedn

Dabbler
Joined
Mar 9, 2023
Messages
14
Side note, I just went and touched the chips for the NICs up along side the NVME SSD on the very left. They are not even warm to the touch. I then transferred 20gb to TrueNAS via SMB and they didn't even get warm. I'm not entirely sure this is the issue but next time the GUI is not accessible, I'll see what their temp is. Failing that I have a USB NIC that I can plug in to test with as well.
 

boostedn

Dabbler
Joined
Mar 9, 2023
Messages
14
Update to this, it happened again around 5:30am when nothing was being used on the network, no backups, no SMB transfers, nothing. Highest temp reported by TRUENAS is 35C on the CPU. Unplugging the network cable this morning and plugging it back in resolved the issue, like it did before. No restarts required.

I'm beginning to think this is a software bug/timeout somewhere. The connection seems to drop for no reason at all. I've found various forum threads, reddit threads, etc complaining of the same with no fix. I also tested my RealTek TP-Link USB Ethernet adapter. Same issue occurs. Since there is no resolution I have no choice to ditch TrueNAS and go with something else.


Thanks for the help, but this one I think is a lost cause unless I buy a box from TN directly.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
What driver is your system actually using for the 2.5G NICs? What shows up in lsmod?
 

boostedn

Dabbler
Joined
Mar 9, 2023
Messages
14
Hi Samuel,

I gave up on this and tried building another machine but ran into other issues with a SAS card... lol, but this morning I decided to give it another shot with the original hardware:

lsmod output:


admin@truenas[~]$ lsmod
Module Size Used by
binfmt_misc 24576 1
essiv 16384 1
authenc 16384 1 essiv
ntb_netdev 20480 0
ntb_transport 53248 1 ntb_netdev
ntb_split 20480 0
ntb 24576 2 ntb_transport,ntb_split
ioatdma 65536 0
dca 16384 1 ioatdma
dm_crypt 61440 1
dm_mod 180224 4 dm_crypt
snd_hda_codec_hdmi 73728 1
snd_sof_pci_intel_icl 16384 0
snd_sof_intel_hda_common 106496 1 snd_sof_pci_intel_icl
soundwire_intel 49152 1 snd_sof_intel_hda_common
soundwire_generic_allocation 16384 1 soundwire_intel
soundwire_cadence 40960 1 soundwire_intel
snd_sof_intel_hda 20480 1 snd_sof_intel_hda_common
snd_sof_pci 20480 2 snd_sof_pci_intel_icl,snd_sof_intel_hda_common
snd_sof_xtensa_dsp 16384 1 snd_sof_intel_hda_common
snd_sof 147456 2 snd_sof_pci,snd_sof_intel_hda_common
snd_soc_hdac_hda 24576 1 snd_sof_intel_hda_common
snd_hda_ext_core 36864 3 snd_sof_intel_hda_common,snd_soc_hdac_hda,snd_sof_intel_hda
snd_soc_acpi_intel_match 57344 2 snd_sof_pci_intel_icl,snd_sof_intel_hda_common
snd_soc_acpi 16384 2 snd_soc_acpi_intel_match,snd_sof_intel_hda_common
snd_soc_core 331776 4 soundwire_intel,snd_sof,snd_sof_intel_hda_common,snd_soc_hdac_hda
snd_compress 28672 1 snd_soc_core
soundwire_bus 94208 3 soundwire_intel,soundwire_generic_allocation,soundwire_cadence
ledtrig_audio 16384 1 snd_sof
snd_hda_intel 57344 0
snd_intel_dspcfg 28672 2 snd_hda_intel,snd_sof_intel_hda_common
snd_intel_sdw_acpi 20480 2 snd_sof_intel_hda_common,snd_intel_dspcfg
snd_hda_codec 176128 3 snd_hda_codec_hdmi,snd_hda_intel,snd_soc_hdac_hda
snd_hda_core 110592 7 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_ext_core,snd_hda_codec,snd_sof_intel_hda_common,snd_soc_hdac_hda,snd_sof_intel_hda
snd_hwdep 16384 1 snd_hda_codec
x86_pkg_temp_thermal 20480 0
intel_powerclamp 20480 0
snd_pcm 143360 9 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,soundwire_intel,snd_sof,snd_sof_intel_hda_common,snd_compress,snd_soc_core,snd_hda_core
iTCO_wdt 16384 0
snd_timer 49152 1 snd_pcm
intel_pmc_bxt 16384 1 iTCO_wdt
snd 118784 8 snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_hda_codec,snd_timer,snd_compress,snd_soc_core,snd_pcm
coretemp 20480 0
iTCO_vendor_support 16384 1 iTCO_wdt
mei_hdcp 24576 0
intel_rapl_msr 20480 0
kvm_intel 339968 0
kvm 1052672 1 kvm_intel
irqbypass 16384 1 kvm
intel_cstate 20480 0
pcspkr 16384 0
wmi_bmof 16384 0
ee1004 20480 0
watchdog 32768 1 iTCO_wdt
soundcore 16384 1 snd
mei_me 45056 1
mei 155648 3 mei_hdcp,mei_me
sg 36864 0
evdev 28672 5
i915 3080192 2
joydev 28672 0
ttm 86016 1 i915
drm_kms_helper 315392 1 i915
processor_thermal_device_pci_legacy 16384 0
cec 61440 2 drm_kms_helper,i915
processor_thermal_device 20480 1 processor_thermal_device_pci_legacy
processor_thermal_rfim 16384 1 processor_thermal_device
processor_thermal_mbox 16384 2 processor_thermal_rfim,processor_thermal_device
rc_core 65536 1 cec
processor_thermal_rapl 20480 1 processor_thermal_device
intel_rapl_common 28672 2 intel_rapl_msr,processor_thermal_rapl
int340x_thermal_zone 20480 1 processor_thermal_device
i2c_algo_bit 16384 1 i915
intel_soc_dts_iosf 20480 1 processor_thermal_device_pci_legacy
intel_pmc_core 53248 0
acpi_tad 20480 0
button 24576 0
acpi_pad 184320 0
configfs 57344 1
fuse 172032 1
drm 643072 4 drm_kms_helper,i915,ttm
sunrpc 667648 1
efivarfs 16384 1
ip_tables 36864 0
x_tables 57344 1 ip_tables
autofs4 53248 2
sr_mod 28672 0
cdrom 73728 1 sr_mod
zfs 4161536 26
zunicode 335872 1 zfs
zzstd 499712 1 zfs
zlua 204800 1 zfs
zcommon 106496 1 zfs
znvpair 106496 2 zfs,zcommon
zavl 16384 1 zfs
icp 323584 1 zfs
spl 139264 6 zfs,icp,zzstd,znvpair,zcommon,zavl
hid_logitech_hidpp 53248 0
raid10 69632 0
raid456 184320 0
async_raid6_recov 24576 1 raid456
async_memcpy 20480 2 raid456,async_raid6_recov
async_pq 20480 2 raid456,async_raid6_recov
async_xor 20480 3 async_pq,raid456,async_raid6_recov
async_tx 20480 5 async_pq,async_memcpy,async_xor,raid456,async_raid6_recov
xor 24576 1 async_xor
uas 32768 0
usb_storage 81920 1 uas
hid_logitech_dj 28672 0
raid6_pq 122880 3 async_pq,raid456,async_raid6_recov
libcrc32c 16384 1 raid456
crc32c_generic 16384 0
raid1 53248 1
hid_generic 16384 0
usbhid 65536 1 hid_logitech_dj
hid 151552 4 usbhid,hid_generic,hid_logitech_dj,hid_logitech_hidpp
multipath 20480 0
linear 20480 0
raid0 24576 0
crc32_pclmul 16384 0
crc32c_intel 24576 1
md_mod 188416 7 raid1,raid10,raid0,linear,raid456,multipath
sd_mod 61440 4
spi_pxa2xx_platform 32768 0
i2c_i801 32768 0
ghash_clmulni_intel 16384 0
xhci_pci 20480 0
dw_dmac 16384 0
dw_dmac_core 36864 1 dw_dmac
aesni_intel 380928 3
crypto_simd 16384 1 aesni_intel
cryptd 24576 3 crypto_simd,ghash_clmulni_intel
i2c_smbus 20480 1 i2c_i801
igc 159744 0
ptp 32768 1 igc
ahci 45056 4
pps_core 24576 1 ptp
nvme 49152 2
ahciem 16384 1 ahci
libahci 45056 1 ahci
xhci_hcd 315392 1 xhci_pci
nvme_core 143360 3 nvme
t10_pi 16384 2 sd_mod,nvme_core
crc_t10dif 20480 1 t10_pi
crct10dif_generic 16384 0
crct10dif_pclmul 16384 1
crct10dif_common 16384 3 crct10dif_generic,crc_t10dif,crct10dif_pclmul
libata 299008 3 libahci,ahci,ahciem
intel_lpss_pci 28672 2
intel_lpss 16384 1 intel_lpss_pci
idma64 20480 0
scsi_mod 270336 7 sd_mod,usb_storage,uas,libata,sg,ahciem,sr_mod
scsi_common 16384 7 scsi_mod,usb_storage,uas,libata,sg,ahciem,sr_mod
usbcore 331776 5 xhci_hcd,usbhid,usb_storage,xhci_pci,uas
usb_common 16384 2 xhci_hcd,usbcore
fan 20480 0
wmi 36864 1 wmi_bmof
video 61440 1 i915
admin@truenas[~]$


Edit: Was doing some digging and seems it is the "igc" driver. Is this the correct one?
Edit 2: Hmm.... https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/ might be relevant. I'll give this a try.

Anyone have an idea how to implement this in TrueNAS? For UEFI and CSM boot, the files are not there to edit.
 
Last edited:

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
midclt call system.advanced.update '{ "kernel_extra_options": "pcie_port_pm=off" }'
 

boostedn

Dabbler
Joined
Mar 9, 2023
Messages
14
Thank you Samuel!

I've entered this and will monitor over the next couple days to see if I run into the same issue.


admin@truenas[~]$ midclt call system.advanced.update '{ "kernel_extra_options": "pcie_port_pm=off" }'
{"id": 1, "consolemenu": true, "serialconsole": false, "serialport": "ttyS0", "serialspeed": "9600", "powerdaemon": false, "swapondrive": 2, "overprovision": null, "traceback": true, "advancedmode": false, "autotune": false, "debugkernel": false, "uploadcrash": true, "anonstats": true, "anonstats_token": "", "motd": "Welcome to TrueNAS", "boot_scrub": 7, "fqdn_syslog": false, "sed_user": "USER", "sysloglevel": "F_INFO", "syslogserver": "", "syslog_transport": "UDP", "kdump_enabled": false, "isolated_gpu_pci_ids": [], "kernel_extra_options": "pcie_port_pm=off", "syslog_tls_certificate": null, "syslog_tls_certificate_authority": null, "consolemsg": false}
admin@truenas[~]$
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Note, the kernel options only take effect on the next boot.
 

boostedn

Dabbler
Joined
Mar 9, 2023
Messages
14
I figured. Already rebooted. :) Crossing my fingers haha.
 

boostedn

Dabbler
Joined
Mar 9, 2023
Messages
14
Results are promising so far! No issues yet to report! I'll give it another day or so and provide another update. Thanks again Samuel!
 

boostedn

Dabbler
Joined
Mar 9, 2023
Messages
14
Update: Still running today, have had no network dropouts/webui issues since running that command. Marking this one as resolved! Thanks so much!

- Scott
 

phospholipid

Dabbler
Joined
Mar 2, 2024
Messages
15
midclt call system.advanced.update '{ "kernel_extra_options": "pcie_port_pm=off" }'
Hey there, good evening. I feel a little out of turn necro'ing a thread, but after reading this thread, linked threads, and a pretty extensive Reddit and google search... I have a feeling this might help me.

I oversee a unit that was behaving wonderfully for many years, but it was running FreeNAS 11.1. We upgraded through 11.2 and then 11.3. Then when we upgraded to TrueNAS 12, we hit an issue. It reads almost just like this. We would lose the GUI and SSH into Shell. We would lose any new SMB connections, though for some reasons already-connected ones would hang on. We upgraded to 13.0-U6.1 and we're still having the issue. The only solution is rebooting, and that's looking like twice a day or so. Or, it's just leaving everybody connected and their SMB untouched.

Every thread I read anywhere, about TrueNAS or Linux, where the symptoms line up, this bit of code has been the workaround.

If I reboot, and run enter that line into Shell, and then reboot again... I am assuming it's non-destructive. Like, it won't break anything. It'll either help or not, but it doesn't look like it will ruin the device or the pools or anything. And if it does somehow create a worse situation, I can undo it with =on, right? Like... I might as well try it, yeah?

To rephrase your quote: "I know just enough to be dangerous." Yeah, that's me.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
@phospholipid, that call is for Scale, not Core. The equivalent for Core is to set a LOADER tunable:

hw.pci.enable_aspm = 0

However, you haven't provided enough details of your hardware to help troubleshoot.
 
Last edited:

phospholipid

Dabbler
Joined
Mar 2, 2024
Messages
15
@phospholipid, that call is for Scale, not Core. The equivalent for Core is to set a LOADER tunable:

hw.pci.enable_aspm = 0

However, you haven't provided enough details of your hardware to help troubleshoot.
Yeah, I figured out this morning that the command doesn't work in CORE. I added a little more to my config signature. I'm not sure if that helps. Currently the GUI and Shell are down again. I'm trying to gather my wits as I gather a plan. As Flash Gordon says, "I'm flying blind on a rocket cycle."

I have a debug file and I have the results of
Code:
dmesg
, but I'm not sure what's salient.

I'm about 87% sure, based on this thread and a bunch of others I've read, and some advice from Reddit, that the problem we're experiencing is the same and that the command in question will help. My next step was to upgrade to SCALE/Bluefin and go from there. But at this point I'm apprehensive. I went from a no-fuss FreeNAS 11 to a temperamental little vixen.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
The 11->12 upgrade was a big jump. In my case, I just clean-installed 12, imported my pool, and rebuilt my config by hand, off screenshots of my 11 config. I encountered no problems.

You can always revert to 11, by rebooting into the 11 boot environment to restore stability.
 
Last edited:

phospholipid

Dabbler
Joined
Mar 2, 2024
Messages
15
The 11->12 upgrade was a big jump. In my case, I just clean-installed 12, imported my pool, and rebuilt my config by hand, off screenshots of my 11 config. I encountered no problems.

You can always revert to 11, by rebooting into the 11 boot environment to restore stability.
Yeah, I've thought about it. The whole reason we moved to 12 was under the recommendation of OWC because we wanted easier control over rsync and replication tasks and some other features. After 15 years of telling folks to do clean installs of major step-ups, here I am. Right now I'm waiting for 15 14TiB to copy before I do anything else. I can't interface with the unit at all because the active SMBs are the only thing live.
 
Top