beralt
Dabbler
- Joined
- Jan 8, 2019
- Messages
- 18
Dear all,
I am experiencing a possibly rather broken system (even though most things still seem to work), and I would really appreciate input on what is going on and what steps I should take.
I am not even sure if all the errors below are related and am grateful for any help.
Also, please let me know if I should post that in another sub-forum.
A little background on my situation:
After not actively managing my NAS for a while, I have recently upgraded from FreeNas 11.3.U2 to 11.3U5 to TrueNas 12.
Even though I am not sure if this was caused by the update, I experienced a hung system, where I couldn't access the system anymore, neither via the webUI nor via ssh.
I then tried to gracefully ("orderly") shutdown the system via the IPMI, but this failed - only an "immediate" reboot did the job.
For more context on my system:
In any case, I am experiencing the following errors:
The first one regards my freenas-boot pool with the system:
Tracing it back, this problem was already there before the update to TrueNAS.
Secondly, my main storage pool "vault" also has a problem:
Also, the observation that something is wrong with "rrdcached" (sorry, I don't know what this is/does), since my terminal is endlessly flooded with this error:
Further, mounting my NFS shares on my Linux Desktop throws "duplicate file system cookie errors" like this:
I am not sure if this is related.
Even though I had my FreeNAS system running for over two years, I am a Newbie. So far I used it mostly for playing around vim VMs and for storage in my LAN.
Honestly, I am quite overwhelmed by this and am not sure what to do and would love to get any input on that.
I have also ordered 2 new 4TB HDDs to use for backups, in case I need to completely recreate/replace the old pool, which spans all of the physical drives.
If the system is completely broken, I am considering backing up all data and simply starting anew. However, if this could be omitted I would be glad.
Let me know if I can provide any more information.
I am experiencing a possibly rather broken system (even though most things still seem to work), and I would really appreciate input on what is going on and what steps I should take.
I am not even sure if all the errors below are related and am grateful for any help.
Also, please let me know if I should post that in another sub-forum.
A little background on my situation:
After not actively managing my NAS for a while, I have recently upgraded from FreeNas 11.3.U2 to 11.3U5 to TrueNas 12.
Even though I am not sure if this was caused by the update, I experienced a hung system, where I couldn't access the system anymore, neither via the webUI nor via ssh.
I then tried to gracefully ("orderly") shutdown the system via the IPMI, but this failed - only an "immediate" reboot did the job.
For more context on my system:
- Supermicro X10SDV-4C-7TP4F
- Intel® Xeon® processor D-1518, Single socket FCBGA 1667; 4-Core, 8 Threads, 35W
- 48 GB RAM
- 4xWD Red 2TB for storage, 2x32GB SanDisk USB SSD drives for system
In any case, I am experiencing the following errors:
The first one regards my freenas-boot pool with the system:
Code:
zpool status -v pool: freenas-boot state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: scrub repaired 0B in 00:08:45 with 0 errors on Sat Mar 13 03:53:45 2021 config: NAME STATE READ WRITE CKSUM freenas-boot DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 da4p2 DEGRADED 0 0 0 too many errors da5p2 ONLINE 0 0 0 errors: No known data errors
Tracing it back, this problem was already there before the update to TrueNAS.
Secondly, my main storage pool "vault" also has a problem:
Code:
pool: vault state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: resilvered 639M in 00:01:08 with 36622 errors on Fri Mar 19 13:27:23 2021 config: NAME STATE READ WRITE CKSUM vault DEGRADED 0 0 0 mirror-0 DEGRADED 35.8K 0 0 gptid/e3a11d9e-a2e1-11e7-ad5e-0025905e1638.eli REMOVED 0 0 0 gptid/e4bf2724-a2e1-11e7-ad5e-0025905e1638.eli ONLINE 0 0 71.5K mirror-1 ONLINE 0 0 0 gptid/e60143a6-a2e1-11e7-ad5e-0025905e1638.eli ONLINE 0 0 0 gptid/e721c833-a2e1-11e7-ad5e-0025905e1638.eli ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log/utx.lastlogin /var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log/maillog.0.bz2 /var/db/system/syslog-76c11d7f8a944b3d8e42fe35420dbaa3/log/console.log.0.bz2 /var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/localhost/zfs_arc_v2/gauge_arcstats_raw_mru-mfu_ghost_hits.rrd /var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/localhost/aggregation-cpu-sum/cpu-idle.rrd /var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/localhost/zfs_arc/memory_throttle_count.rrd /var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/localhost/df-mnt-vault-apps-transmission/df_complex-reserved.rrd /var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/localhost/zfs_arc_v2/gauge_arcstats_raw_counts-allocated.rrd /var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/localhost/df-mnt-vault-backups/df_complex-free.rrd /var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/localhost/aggregation-cpu-sum/cpu-interrupt.rrd /var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/localhost/df-mnt-vault-archive/df_complex-reserved.rrd [...] [200+ more *.rrd files in var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/localhost/] [...] /var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/localhost/zfs_arc/mutex_operations-miss.rrd /var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/localhost/zfs_arc/hash_collisions.rrd /var/db/system/rrd-76c11d7f8a944b3d8e42fe35420dbaa3/localhost/df-mnt-vault-apps-tautulli/df_complex-reserved.rrd /var/db/system/configs-76c11d7f8a944b3d8e42fe35420dbaa3/TrueNAS-12.0-U2.1/20210319.db /mnt/vault/lingames/Steam/steamapps/shadercache/35720/mesa_shader_cache_sf/c1516fe0adc2164672ac79ffa0d26cd3/AMD RADV VEGA10 (ACO)/foz_cache_idx.foz /mnt/vault/lingames/Steam/steamapps/shadercache/512900/mesa_shader_cache_sf/c1516fe0adc2164672ac79ffa0d26cd3/AMD RADV VEGA10 (ACO)/foz_cache_idx.foz /mnt/vault/lingames/Steam/steamapps/shadercache/945360/mesa_shader_cache_sf/c1516fe0adc2164672ac79ffa0d26cd3/AMD RADV VEGA10 (ACO)/foz_cache_idx.foz vault/vm_images/valheim_lgsm-v4li08:<0x1>
Also, the observation that something is wrong with "rrdcached" (sorry, I don't know what this is/does), since my terminal is endlessly flooded with this error:
Code:
Mar 19 15:22:14 heimii 1 2021-03-19T15:22:14.454128+01:00 heimii.lan collectd 5514 - - rrdcached plugin: Failed to connect to RRDCacheD at unix:/var/run/rrdcached.sock: Unable to connect to rrdcached: Connection refused (status=61)
Further, mounting my NFS shares on my Linux Desktop throws "duplicate file system cookie errors" like this:
Code:
kernel: FS-Cache: Duplicate cookie detected kernel: FS-Cache: O-cookie c=000000001e72b895 [p=0000000089da8da7 fl=222 nc=0 na=1] kernel: FS-Cache: O-cookie d=00000000c3a2cbed n=00000000f757123a kernel: FS-Cache: O-key=[10] '040002000801c0a805c3' kernel: FS-Cache: N-cookie c=00000000ea48db1d [p=0000000089da8da7 fl=2 nc=0 na=1] kernel: FS-Cache: N-cookie d=00000000c3a2cbed n=000000000f72327e kernel: FS-Cache: N-key=[10] '040002000801c0a805c3'
I am not sure if this is related.
Even though I had my FreeNAS system running for over two years, I am a Newbie. So far I used it mostly for playing around vim VMs and for storage in my LAN.
Honestly, I am quite overwhelmed by this and am not sure what to do and would love to get any input on that.
I have also ordered 2 new 4TB HDDs to use for backups, in case I need to completely recreate/replace the old pool, which spans all of the physical drives.
If the system is completely broken, I am considering backing up all data and simply starting anew. However, if this could be omitted I would be glad.
Let me know if I can provide any more information.