boot pool DEGRADED even after replacement

MarioW

Dabbler
Joined
Aug 25, 2019
Messages
18
I am still quite new to FreeNAS - but I think I am doing ok in general. My FreeNAS setup is running since a few weeks - but I have experienced a strange behavior of my boot pool that I don't understand:
I wanted to have one big mirrored data pool of 8 TB for my data, additionaly a "fast" 500 GB SSD to place my VM images.

As boot pool, I decided to buy a small SSD hard drive (first I went with a SanDisk Drive 120GB, nothing special but new from store).
I installed FreeNAS 11.2 U5 on it and upgraded to U6 later.

After activating E-Mail notifications I got this message:

Code:
Checking status of zfs pools:
NAME              SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
IronWolfPro_8TB  7.25T   859G  6.41T        -         -     0%    11%  1.00x  ONLINE  /mnt
SSD_500GB         460G  3.82G   456G        -         -     0%     0%  1.00x  ONLINE  /mnt
freenas-boot      111G   762M   110G        -         -      -     0%  1.00x  DEGRADED  -

  pool: freenas-boot
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0 days 00:00:02 with 29 errors on Fri Oct  4 03:45:02 2019
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  DEGRADED     0     0     1
      ada0p2    DEGRADED     0     0     5  too many errors

errors: 27 data errors, use '-v' for a list


I tried to scrub the Boot pool - didn't help. I really wondered, because the SSD was new - but ok, I bought a new one - this time a WD SSD with - again - 120 GB SSD. (see my signature).

I reinstalled FreeNAS 11.2 U6 from USB Stick, uploaded my previously configuration back up and everything seemed to be ok. BUT after the first night / SMART Test, I got the message again, that the boot pool is degraded.
Here is the latest output:

Code:
Checking status of zfs pools:
NAME              SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
IronWolfPro_8TB  7.25T  2.33T  4.92T        -         -     0%    32%  1.00x  ONLINE  /mnt
SSD_500GB         460G  4.71G   455G        -         -     0%     1%  1.00x  ONLINE  /mnt
freenas-boot      111G   767M   110G        -         -      -     0%  1.00x  DEGRADED  -

  pool: freenas-boot
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0 days 00:00:03 with 27 errors on Fri Oct 11 07:36:21 2019
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  DEGRADED     0     0    31
      ada0p2    DEGRADED     0     0    62  too many errors

errors: 25 data errors, use '-v' for a list

Again - performing a scrub on the boot pool doesn't help.
And - I think the chance that I got 2 times brand new defect SSD drives from different stores from different brands is very low.
I don't have any problems during operation! It is only the report that scares me a bit, since I want to use the server productive soon. But this issue is strange.

Anyone has an Idea what this could be about? Where could I get more information about the issue?

Hoping for help

Best Regards
Mario
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hey Mario,

Can you move your boot device to a different port ? Any chance for you to test your RAM with something like Memtest ? Two back-to-back DOA would be strange indeed but not impossible. Just lets clear other cases before buying more stuff in case you don't need it.

Good luck troubleshooting
 
Joined
Jan 18, 2017
Messages
525

MarioW

Dabbler
Joined
Aug 25, 2019
Messages
18

MarioW

Dabbler
Joined
Aug 25, 2019
Messages
18
Hey Mario,

Can you move your boot device to a different port ? Any chance for you to test your RAM with something like Memtest ? Two back-to-back DOA would be strange indeed but not impossible. Just lets clear other cases before buying more stuff in case you don't need it.

Good luck troubleshooting

Different SATA Port: Next time I open the server hardware, I'll try that.
RAM Test: Something like that? https://www.familybrown.org/dokuwiki/doku.php?id=fester:hvalid_ram
 

MarioW

Dabbler
Joined
Aug 25, 2019
Messages
18
3.jpg

fyi: Memtest passed successfully without any errors
 

MarioW

Dabbler
Joined
Aug 25, 2019
Messages
18

MarioW

Dabbler
Joined
Aug 25, 2019
Messages
18
I totally forgot, that cobrakiller58 suggested to disable TRIM for this drive, I haven't tried that yet.

I assume, as soon as I did that, I have to reinstall the boot pool - I'll do that. Or can I get around that this time and just kind of reset the SMART values so that the pool will get ONLINE again?

Coming back to TRIM - two questions I do still have oin this:
  1. I did a little research, and I already figured out, that I have to set vfs.zfs.trim.enabled to "0". How do I do that? in System -> Tunables? Put there "vfs.zfs.trim.enabled" as variable and "0" as value? Which type? "loader"? rc.conf or sysctl?
  2. But even if that is it - then I would probably disable TRIM for the whole system. The problem is: I do have two SSDs attached: One containing the boot pool and another one used as regular non-mirrored SSD that holds my VMs as a regular data pool. With disabling TRIM I would probably disable it for this pool too, which would not be that good, right? Is there a possibility just to disable TRIM for the boot pool and not the data pool?


Hopy you guys can help me out with this, I am still lost.



My research so far (more for me to look up later).
 

MarioW

Dabbler
Joined
Aug 25, 2019
Messages
18
Hello together,
I just wanted to let you know that I solved the problem on my own.
Unfortunately by spending even more money...

I bought the SuperDom "SSD-DM032-SMCMVN1" from SuperMicro https://www.supermicro.com/products/nfo/SATADOM.cfm
for 65 €.

This seems to be the preferred solution anyways.
So now I have 2x 120 GB SSDs at home, one from SanDisk and one from WD. Hopefully I'll find any use for them one day.

Now I don't get any more errors! The boot pool is fine.
 

NASn00b

Cadet
Joined
Apr 3, 2012
Messages
9
I've just fallen foul of this. After 8 years of running FreeNAS 8/9 on a single USB stick with no issues, I started to build a replacement to run version 11 and, based on wisdom from the community, I went for a boot pool based on mirrored SSD's. I bought 2 x WD Green 3D NAND 120GB 2.5" SATA (WDS120G2G0A). Within a couple of days I was receiving alerts about a corrupted boot pool and the logs showed the same problem as reported in NAS-100276 (sorry didn't keep the logs).

I couldn't figure out how to disable TRIM before FreeNAS booted. So I had to replace the shiny new WD Green SSD's with a five year-old Kingston SSD I had 'in stock', reinstall from the FreeNAS ISO and recover my set up from the config files. Everything came back OK but it was an unexpected and unwanted interlude. The old Kingston drive has been flawless since (couple of weeks).

It would be good to have a solution or even a warning out there :cool:
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
It would be good to have a solution or even a warning out there
That issue has been discussed in the forums for almost 2 years (and has been in my signature since I became part of it back at that time).

There are plenty of threads where I (and others) have helped people to understand their issue and I have told people posting builds to watch out for it...

A good example from 2018... https://www.ixsystems.com/community...de-discussion-thread.46494/page-9#post-461794
 
Last edited:

NASn00b

Cadet
Joined
Apr 3, 2012
Messages
9
IMHO the warning should find its way into the hardware build guides. Neither the ixsystems official guide or Ericloewe's guide make any reference to this. It would also be interesting to see if anyone has come up with a solution to disabling TRIM before your pool gets corrupted. I have seen a suggestion that this can be done from the Grub prompt but without any detailed instructions.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Here you go:
 
Top