New TrueNas Core, critical error, all 8 drives in pool degraded, permanent file errors, noob needs advise

Polynikes

Dabbler
Joined
Sep 2, 2023
Messages
15
Hello everyone,

I just built my first TrueNAS system (specs below) with repurposed old gaming computer hardware and eight 16TB Ironwolf Pro drives in a single pool, ZFS-R2. This hardware was all running smoothly for over a year as a windows machine running some server software. The only thing I changed when I converted it to TrueNAS was adding the eight HDDs, moved the HBA controller from my linux machine onto this machine, added a 10 gbps NIC, new SATA power cables from PSU to the HDDs, and the chassis is new as well. All the HDDs are connected via the HBA.

Code:
OS: TrueNAS Core13.0-U5.3
Mobo: Gigabyte GA-Z270XP-SLI (rev 1.0)
CPU: Intel i7-6700k
RAM: Corsair 32 GB (8x4) (forget exact model)
OS drive: Samsung 980 pro m2
Pool drives: Seagate Ironwolf Pro 16 TB x 8
HBA: Silverstone ‎SST-ECS04 (LSISAS2308 RAID-on-chip controller).
PSU: Corsair Rm850x
NIC: TP-Link 10GB PCIe network card (TX401)


I migrated over the data from my old HDDs via SMB shares (from a linux mint computer I'm using to run various apps like Plex, home assistant, etc) and when I checked it out I had errors below:

Code:
WARNING
truenas.local had an unscheduled system reboot. The operating system successfully came back online at Fri Sep 1 23:40:23 2023.
2023-09-01 23:40:23 (America/Chicago)

CRITICAL
Pool RaptorDrive state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
The following devices are not healthy:
Disk ATA ST16000NT001-3LV ZR5BVCV0 is DEGRADED
Disk ATA ST16000NT001-3LV ZR5DY5JJ is DEGRADED
Disk ATA ST16000NT001-3LV ZR5BGTE4 is DEGRADED
Disk ATA ST16000NT001-3LV ZR60M2ME is DEGRADED
Disk ATA ST16000NT001-3LV ZR5BVNWB is DEGRADED
Disk ATA ST16000NT001-3LV ZR5CAPSJ is DEGRADED
Disk ATA ST16000NT001-3LV ZR5DPBJG is DEGRADED
Disk ATA ST16000NT001-3LV ZR70SQV0 is DEGRADED
2023-09-02 01:25:02 (America/Chicago)


I ran a scrub and zpool status revealed the following:

Code:
Warning: the supported mechanisms for making configuration changes
are the TrueNAS WebUI and API exclusively. ALL OTHERS ARE
NOT SUPPORTED AND WILL RESULT IN UNDEFINED BEHAVIOR AND MAY
RESULT IN SYSTEM FAILURE.

root@truenas[~]# zpool status -v RaptorDrive
  pool: RaptorDrive
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 668K in 02:01:41 with 92 errors on Sat Sep  2 01:30:19 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        RaptorDrive                                     DEGRADED     0     0 0
          raidz2-0                                      DEGRADED     0     0 0
            gptid/fb577ec1-4862-11ee-8024-40ed006fd9a0  DEGRADED     0     0   185  too many errors
            gptid/fc09d02d-4862-11ee-8024-40ed006fd9a0  DEGRADED     0     0   187  too many errors
            gptid/fc2c57b7-4862-11ee-8024-40ed006fd9a0  DEGRADED     0     0   190  too many errors
            gptid/fc383726-4862-11ee-8024-40ed006fd9a0  DEGRADED     0     0   187  too many errors
            gptid/fc229ad8-4862-11ee-8024-40ed006fd9a0  DEGRADED     0     0   188  too many errors
            gptid/fcc38e50-4862-11ee-8024-40ed006fd9a0  DEGRADED     0     0   188  too many errors
            gptid/fa5e8a52-4862-11ee-8024-40ed006fd9a0  DEGRADED     0     0   190  too many errors
            gptid/fbf72583-4862-11ee-8024-40ed006fd9a0  DEGRADED     0     0   190  too many errors

errors: Permanent errors have been detected in the following files:

        /mnt/RaptorDrive/Matt Personal/Radiology/Study Materials/AIRP/Gastrointestinal/AIRP - GI 114a - Celiac Disease.mp4
        /mnt/RaptorDrive/Matt Personal/My Videos/Safari Video.m2ts
        /mnt/RaptorDrive/Matt Personal/Radiology/Study Materials/AIRP/Pediatric/AIRP - PD 109 - Cystic Renal Disease.mp4
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 2/2x20 - Allies.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 3/3x11 - The Return (2).mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 3/3x13 - Irresponsible.mkv
        /mnt/RaptorDrive/Media/TV Shows/Star Wars - The Clone Wars/Season 3 - Secrets Revealed (2010-2011)/The Clone Wars S03E05 HDTV - Corruption.avi
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 3/3x15 - The Game.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 4/4x05 - Travelers.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 5/5x02 - The Seed.mkv
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 12 S12 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S12E12 - About Last Night... [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/Stranger Things (2016)/Season 3/Stranger.Things.S03E04.iNTERNAL.1080p.WEB.X264-AMRAP.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stranger Things (2016)/Season 3/Stranger.Things.S03E08.iNTERNAL.1080p.WEB.X264-AMRAP.mkv
        /mnt/RaptorDrive/Media/TV Shows/Star Wars - The Clone Wars/Season 4 - Battle Lines (2011-2012)/The Clone Wars S04E06 HDTV - Nomad Droids.avi
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 2 S02 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S02E10 - Chickenpox [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/The Deuce (2017)/Season 1/The Deuce (2017) - S01E06 - Why Me (1080p BluRay x265 RCVR).mkv
        /mnt/RaptorDrive/Media/TV Shows/The Office US (2005)/Season 4 (WEB)/TheOffice (US) (2005) - S04E01-E02 - Fun Run (1080p AMZN WEB-DL x265 LION).mkv
        /mnt/RaptorDrive/Media/TV Shows/The Mandalorian (2019)/The.Mandalorian.S01E06.REPACK.1080p.WEBRiP.x264-PETRiFiED.mkv
        /mnt/RaptorDrive/Media/TV Shows/The Office US (2005)/Season 7 (Bluray)/The Office (US) (2005) - S07E06 - Costume Contest (1080p BluRay x265 LION).mkv
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 7 S07 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S07E06 - Lil' Crime Stoppers [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 12 S12 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S12E04 - Canada on Strike [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 12 S12 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S12E06 - Over Logging [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 1/1x18 - The Gift.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 4/4x13 - Quarantine.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 5/5x03 - Broken Ties.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 5/5x05 - GhostIn The Machine.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 5/5x12 - Outsiders.mkv
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 16 S16  [1080p H265][MP3 5.1 Ch]/South Park (1997) - S16E06 - I Should Have Never Gone Ziplining [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 4/4x06 - Tabula Rasa.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 4/4x11 - Be All My Sins Remember'd.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 4/4x14 - Harmony.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 4/4x16 - Trio.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 5/5x09 - Tracker.mkv
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 3 S03 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S03E08 - Two Guys Naked in a Hot Tub (2) [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 4 S04 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S04E17 - A Very Crappy Christmas [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 16 S16  [1080p H265][MP3 5.1 Ch]/South Park (1997) - S16E13 - A Scause for Applause [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 18 S18 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S18E07 - Grounded Vindaloop [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 20 S20  [1080p H265][MP3 5.1 Ch]/South Park (1997) - S20E08 - Members Only [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 4 S04 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S04E06 - Cherokee Hair Tampons [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 2/2x12 - Epiphany.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 2/2x05 - Condemned.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 2/2x15 - The Tower.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 3/3x06 - The Real World.mkv
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 6 S06 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S06E15 - The Biggest Douche in the Universe [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/Scrubs (2001)/Season 9/Scrubs (2001) - S09E12 - Our Driving Issues (1080p WEB-DL x265 Panda).mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 3/3x14 Tao of Rodney.mkv
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 1 S01 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S01E08 - Starvin' Marvin [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 4/4x19 - The Kindred (2).mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 5/5x01 - Search And Rescue.mkv
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 9 S09 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S09E07 - Erection Day [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/Star Wars - The Clone Wars/Season 4 - Battle Lines (2011-2012)/The Clone Wars S04E01-E02 HDTV - Water War - Gungan Attack.avi
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 11 S11 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S11E01 - With Apologies to Jesse Jackson [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 2/2x17 - Coup D etat.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 4/4x03 - Reunion.mkv
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 15 S15  [1080p H265][MP3 5.1 Ch]/South Park (1997) - S15E13 - A History Channel Thanksgiving [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/The Office US (2005)/Season 2 (WEB)/TheOffice (US) (2005) - S02E06 - The Fight (1080p AMZN WEB-DL x265 LION).mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 1/1x14 - Sanctuary.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 3/3x10 - The Return (1).mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 4/4x04 - Doppelganger.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 4/4x07 - Missing.mkv
        /mnt/RaptorDrive/Media/TV Shows/The Boys (2019)/Season 1/The Boys (2019) - S01E05 - Good for the Soul (1080p BluRay x265 Silence).mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 4/4x18 - The Kindred (1).mkv
        /mnt/RaptorDrive/Media/TV Shows/The Mandalorian (2019)/The.Mandalorian.S01E05.1080p.WEBRiP.x264-PETRiFiED.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 5/5x06 - The Shrine.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 2/2x07 - Instinct.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 2/2x09 - Aurora.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 5/5x13 - Inquisition.mkv
        /mnt/RaptorDrive/Media/TV Shows/South Park (1997)/South Park (1997) Season 5 S05 [1080p H265][MP3 5.1 Ch]/South Park (1997) - S05E06 - Cartmanland [1080p H265][MP3 5.1 Ch].mp4
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 1/1x04 - 38 minutes.mkv
        /mnt/RaptorDrive/Media/TV Shows/Scrubs (2001)/Season 9/Scrubs (2001) - S09E09 - Our Stuff Gets Real (1080p WEB-DL x265 Panda).mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 2/2x08 - Conversion.mkv
        /mnt/RaptorDrive/Media/TV Shows/Stargate Atlantis/Season 3/3x01 - No Man's Land.mkv


This is of course quite concerning to me because the whole point in this project was to better protect my data. The idea is to have this TrueNAS system for all files (including big media files), and to back up the really important documents (tax stuff, photos, etc) to both a second local system and also the cloud (Storj or backblaze I think, but haven't done this yet). I was loving how fast and seemingly reliable the system was at first, but now with these errors I've lost all my confidence in it and I'm really disappointed.

Specific questions:
1) How serious are these errors? Can I simply delete the affected files and copy them over again and carry on?

2) Is this indicative of an underlying problem? Or is it expected that out of like 12 TB transferred some files may have errors?

3) Does this just affect the files or also the operating system? (the part about "Applications may be affected." scares me, is this a default phrase?)

4) Could additional files be corrupted as well or does the system pretty reliably pick out the bad ones and I can assume the rest are OK?

5) What exactly does CKSUM "185 too many errors" even mean (from the scrub log)?

6) Does any of this indicate that a specific drive is faulty? I have a hard time believing all 8 brand new drives are faulty. I assume it must be something upstream?

7) What should I do now? I'm basically done transferring all data onto the pool, and next step was to set up a cloud backup, but if the whole thing needs to be rebuilt then obviously that comes first, however painful...

Thank you for any advise. I have read through many forums posts which are similar but not quite answering my questions.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
1st reccomendation - do not delete the old hard disks. They contain your good data

These errors are serious - they indicate something seriously wrong. As you say its unlikley to be the HDD's - so it is more likley to be the cables to the HBA or (even more likley) the HBA itself.

When you built the system - did you run any long term tests to see if everythng was good. e.g. Memtest for 24 hours (at least). Look on this copy as a test exercise that has failed. Now fix the failure and repeat the copy.

What firmware is on the HBA? Did you ensure it was flashed with the correct version?

1) How serious are these errors? Can I simply delete the affected files and copy them over again and carry on?
Very Serious

2) Is this indicative of an underlying problem? Or is it expected that out of like 12 TB transferred some files may have errors?
Underlying problem

3) Does this just affect the files or also the operating system? (the part about "Applications may be affected." scares me, is this a default phrase?)
Default phrase. Its just your data at risk

4) Could additional files be corrupted as well or does the system pretty reliably pick out the bad ones and I can assume the rest are OK?
Pobably not

5) What exactly does CKSUM "185 too many errors" even mean (from the scrub log)?
It means that files are being corrected from parity. They are being corrupted somewhere in the transfer process.

6) Does any of this indicate that a specific drive is faulty? I have a hard time believing all 8 brand new drives are faulty. I assume it must be something upstream?
My suspicion is somthing upstream, memory or HBA or HBA cables (maybe)

7) What should I do now? I'm basically done transferring all data onto the pool, and next step was to set up a cloud backup, but if the whole thing needs to be rebuilt then obviously that comes first, however painful...
You need to run a whole load of diagnosis
Test memory first - thats easy. Just takes time. Minimum 24 hours
Then look into the HBA, Cabling etc. - Suggestion you have 6 SATA ports on the motherboard. Connect 5 or 6 of the drives to those, and create a new pool, then copy a large quantity of data to that pool, run several scrubs etc. If that pool is good you know where the issue is. Also there are scripts on the forum like @jgreco's test script that can be used to test HDD's. BUt I think you need to eliminate the HBA as a cause first
 

Polynikes

Dabbler
Joined
Sep 2, 2023
Messages
15
NugentS,

Thank you so much for the detailed response.

I didn't run any burn in tests because I figured it had been running well previously and didn't really change much. I replaced the corrupted files in place on the pool and I'm running another scrub now. After that's done I'm going to run the memtest.

I haven't messed with the HBA firmware since I purchased it. From looking online, I'm getting conflicting reports about whether the firmware needs to be flashed. Do you have any thoughts on this? What is the point in flashing into "IT mode" that I keep reading about?
 

Polynikes

Dabbler
Joined
Sep 2, 2023
Messages
15
FYI, the HDDs do show up directly to the OS when using this HBA, so I don't think it is in any sort of RAID mode. If it isn't in RAID mode that means it must be in an appropriate mode like IT or HBA, yes?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
FYI, the HDDs do show up directly to the OS when using this HBA, so I don't think it is in any sort of RAID mode. If it isn't in RAID mode that means it must be in an appropriate mode like IT or HBA, yes?
Not strictly, no. Things are likely okay in your case, since it seems that Silverstone, despite the RAID-on-chip marketing, sells those things with IT mode. What's the output of sas2flash -listall?
 

Polynikes

Dabbler
Joined
Sep 2, 2023
Messages
15
Output of sas2flash -listall

Code:
root@truenas[~]# sas2flash -listall
LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

        Adapter Selected is a LSI SAS: SAS2308_2(D1)

Num   Ctlr            FW Ve


Scrub finished, four media files were again found to have permanent errors. I replaced those files in place again. Running another scrub pool. Once everything checks out I'm going to run the 24 hour memtest. After that if everything is looking OK I may attempt to back up everything to a cloud storage provider unless y'all think I need to start totally fresh.

I ordered a LSI SAS 9300 HBA which I may use in either my other PC or replace the one in the TrueNas system. Also ordered new mini SAS to SATA connectors.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
That output seems to be truncated... what happened there?
 

Polynikes

Dabbler
Joined
Sep 2, 2023
Messages
15
Oh my bad it didn't copy over correctly and I didn't notice haha... here it is again.

LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

Adapter Selected is a LSI SAS: SAS2308_2(D1)

Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr
----------------------------------------------------------------------------

0 SAS2308_2(D1) 20.00.07.00 14.01.00.06 07.39.02.00 00:01:00:00
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Ok, so that's the latest firmware, so one thing off the list.
 

Polynikes

Dabbler
Joined
Sep 2, 2023
Messages
15
So each time I scrub i'm finding files with errors, although definitely a lot less files now. Are these files that were missed on the first scrubs or are files being actively messed up?
 

Polynikes

Dabbler
Joined
Sep 2, 2023
Messages
15
Well it just had another unscheduled system reboot as I was sitting here working on it... Had been downloading a 500 GB file from cloud storage into for like 16 hours and it was at 98%... gone now... I am so frustrated.. what the heck is going on here.

Now it also is giving me this output regarding permanent file errors. This is different because these aren't even specific files I can reference. Its just some random numbers/letters. No idea what these correspond to.

errors: Permanent errors have been detected in the following files:

RaptorDrive/Media/TV Shows:<0x117>
RaptorDrive/Media/TV Shows:<0x13fe>
RaptorDrive/Matt Personal/Documents:<0x505>
RaptorDrive/Matt Personal/Documents:<0x7e5>
 

Polynikes

Dabbler
Joined
Sep 2, 2023
Messages
15
I've lost all confidence in this system. I think I'm going to buy a server mobo, processor, and some ECC RAM and rebuild everything on top of that. I've been googling around for hardware recommendations but most of them seem a little out of date. Any chance you could point me in the right direction?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Well it just had another unscheduled system reboot as I was sitting here working on it... Had been downloading a 500 GB file from cloud storage into for like 16 hours and it was at 98%... gone now... I am so frustrated.. what the heck is going on here.

Now it also is giving me this output regarding permanent file errors. This is different because these aren't even specific files I can reference. Its just some random numbers/letters. No idea what these correspond to.
Thats metadata - you cannot fix that I am afraid. Your pool is hosed.

For server hardware. It does not need to be modern. A NAS does not need much power. Its running things like VM's and containers that need the CPU power.

For Supermicro X10 or better I think. Look at the hardware reccomendation guide in my sig (amongst other places).

Have you tested a subset of the disks on SATA ports - to see if the chksum errors appear on that? When you do - also remove the TPLINK 10Gb card and use (if its recognised) the 1Gb on board port. You need to strip back to minimums and test that. If that goes wrong fair enough but it ought to work. Also - have you tested the memory?
 

Polynikes

Dabbler
Joined
Sep 2, 2023
Messages
15
Thanks again for the troubleshooting help. Was getting very frustrated yesterday so had to take a break and clear my head haha. I was already a little nervous about using repurposed old gaming hardware and after this experience I'm inclined to just buy some server equipment and start over.

Any thoughts on this hardware?

Mobo: SUPERMICRO MBD-X11SSM-F-O

CPU: Intel Xeon E-2126G Coffee Lake 3.30 GHz LGA 1151 80W

RAM (x4 for 64 gigs) SAMSUNG M391A2K43BB1-CPB Samsung DDR4-2133 16GB2Gx72 ECC CL15 Samsung Chip Server Memory

HBA: LSI Broadcom SAS 9300-8i

NIC: TP Link TX-401 (10 gig)

And I'll use my existing 850 watt PSU (Rm850x), unless it turns out to be the culprit...

Use case: Network storage to be actively used daily plus option to convert to trunas scale and experiment with VMs in future.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
IF @jgreco is right - thats its the HBA over-heating (finger touch test) then that is potentially an easy fix - strap a fan to the HBA

However Comments on your proposed hardware:

Motherboard: Its what I describe as a workstation board rather than a true Server board. I call it this because its designed for workstation type CPU's. It got UDIMMs rather than RDIMMS and has limited PCIe lanes, and limited memory slots. Having said that its very close to a server board and should be very reliable. For TrueNAS it ticks almost every box apart from the somewhat limited expansion. You have 8 SATA ports (see HBA comments)

CPU: Check carefully here. Thats an E-2126G - the 2 indicating dual socket (I believe). The motherboard says "E3-1200 v6/v5, Intel® 7th/6th Gen. Core™ i3 series" although the socket does match. I say check carefully here as I do not know.

RAM: Overspeccd - The motherboard support 2400MHz, your memory is 2666 so thats all good. The CPU may be able to manage 128GB, but according to SMC's specs - the motherboard can only do 64GB. A BIOS update may fix that. But you only have 4 DIMMs (another workstation board issue)

HBA - You could avoid the use of an HBA by using the 8 SATA ports on the board. However if you like that idea you will need to arrange an alternative method of booting the NAS. Whilst a USB Thumb drive is possible, its not reccomended. But a small SSD (or M.2) in a USB adapter plugged into a motherboard header or into the back of the machine or via some trickery (https://www.startech.com/en-gb/cables/usbmbadapt2) might save you some grief if you can't cool the HBA enough.

NIC: Nope - not that one. Yuk. eTrash. Get an Intel 540 (or better) or Chelsio for optimal performance and reliability. Seriously - do not get the TP-Link [shudders]. As you are in the USA - the land of cheap second hand hardware - a second hand proper 10Gb NIC will cost less than that trash TP-LINK new

PSU: Easily enough power there. Reasonable PSU according to the Tier List. Certainly not cheap trash.
 
Last edited:

Polynikes

Dabbler
Joined
Sep 2, 2023
Messages
15
I started the memtest a few minutes ago and within seconds was getting errors. Here it is after approximately 5 minutes. Does this indicate the memory is the problem?

As far as the HBA finger touch test, I guess i'll just move some large files for a few hours then feel it? Can do this today.

I'll buy this NIC instead: Ipolex X540-10G-2T-X8

Finally, you didn't seem too worried about the use of a gaming board with non-ECC memory (or at least didnt mention it!). From reading the forums I see a lot of comments about "playing with fire" by not using ECC memory. If the RAM or the HBA turns out to be the issue in my system and I can salvage the mobo and CPU, am I still doing myself a big disservice by not investing in a server mobo with ECC memory?



memtest.jpg
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Yup - you have memory problems - so now you get to do the memory dance.

Remove all but 1 DIMM and test. If its good, then test in all other slots - hopefully still good. Put DIMM aside in a box labelled good. Then test each other DIMM in one slot till you find which one(s) are faulty

If the first DIMM tested fails, continue to test in other slots - see if it passes in other slots which may indicate that the first tested slot is bad.

Rinse/Repeat till you know what the situation is.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
As for that NIC - really?

Its allegedly an X540 - but why not a genuine card?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
"Finally, you didn't seem too worried about the use of a gaming board with non-ECC memory (or at least didnt mention it!). From reading the forums I see a lot of comments about "playing with fire" by not using ECC memory. If the RAM or the HBA turns out to be the issue in my system and I can salvage the mobo and CPU, am I still doing myself a big disservice by not investing in a server mobo with ECC memory?"

Honestly I chose to ignore it. However - now you mention it.......

Gaming gear does work mostly. Its not designed for 24*7, contains crappy devices you don't need or want, probably won't last as long and isn't as careful with data as a board that supports ECC. TN does not need server gear with ECC - but it does like it. In the long term your experience will likley be better with server grade gear and ECC than with gamer gear. If your data is important to you then you want as few roads to data corruption as possible. If its just movies then perhaps gamer gear is cheaper and more available and if you have it already - its vastly cheaper.

I don't understand why QNAP / Synology (et al) don't use ECC, even in their small NAS's. Its borderline negligent (IMHO)

I didn't mention it because it was what you have. I could have ranted about "crappy gamer gear", but what would that have achieved - you were having issues with your current equipment - I tried to help with that. Note that the first thing I suggested you do is test the memory. I then suggested you (for testing purposes) eliminate the HBA (after testing the RAM). Testing without the HBA (in the manner suggested) would have destroyed the pool, but from my PoV the whole pool is suspect now anyhow. As you have the data available elsewhere - the pool is disposible.

Your memory is bad - and I suspect that @jgreco may be correct with a potential overheating HBA (which may be toast as a result, or maybe not). They do need decent airflow to keep cool (proper rack mounted cases or strap a fan to the heatsink) as they are designed for enterprise grade installations which tend to be in racks with fan walls pushing air through the case.

It occurs to me you never tyold me / us what the case is. Rack mount, desktop, tower etc
 
Top