Hi Folks,
I have 2 FreeNAS systems for 6 months now, they are identical except HDDs:
Supermicro X10DRH LN4
2x Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
64G Ram
Avago MPT SAS3 SAS Controller (v8.25.00.00) with expander backplane (the controller seems to be LSI3008-IR, FW revision 10.00.03.00-IR)
HDDs are all SATA
[edit: latest FreeNAS 11.1]
System 1 has 6x 6TB WD black (WD6001FZWX),
in system 2 (backup) there are 12 older disks:
6x 3TB WD3000F9YZ and
6x 2TB (2x WD2003FYYS, 2x Hitachi HUS723020ALS640, 2x Hitachi HUA723020ALA640)
System 1 is used for NAS and SAN storage. System 2 for backup (zfs send).
Both systems are running fine until there is some load on it. Then the zpools become degraded because some disks are faulting:
...even with data corruption!
The disks that are faulting are SMART-OK:
Now the strange thing:
First I thought it was a problem with the WD black, as they were in system 1 and at least 2 of them were constantly faulting on higher load. I ordered 2 new ones (I think WD Red Pro, don't know exactly) , replaced the 2 faulting disks (one after another), resilvered. During resilvering it became worse, with data errors...
I moved the config to the boot-HDDs and switched the boot-HDDs from system1 <-> system2. On system 2 (now logically system1) I switched the (formerly backup-) datasets to r/w and used this system as storage system. As you can see in the first example, it's now also faulting on higher load. There were no zpool errors when it was the backup system. Since it became the primary system and got some load there were faulting HDDs randomly in the pool, sometimes some 2TB drives, after I
Please let me know if you need additional data.
Thank you!
Bye!
Marco
I have 2 FreeNAS systems for 6 months now, they are identical except HDDs:
Supermicro X10DRH LN4
2x Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
64G Ram
Avago MPT SAS3 SAS Controller (v8.25.00.00) with expander backplane (the controller seems to be LSI3008-IR, FW revision 10.00.03.00-IR)
HDDs are all SATA
[edit: latest FreeNAS 11.1]
System 1 has 6x 6TB WD black (WD6001FZWX),
in system 2 (backup) there are 12 older disks:
6x 3TB WD3000F9YZ and
6x 2TB (2x WD2003FYYS, 2x Hitachi HUS723020ALS640, 2x Hitachi HUA723020ALA640)
System 1 is used for NAS and SAN storage. System 2 for backup (zfs send).
Both systems are running fine until there is some load on it. Then the zpools become degraded because some disks are faulting:
Code:
NAME STATE READ WRITE CKSUM tank6x6 DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 gptid/36dc3ffc-9fbe-11e7-964a-ac1f6b2067f0 ONLINE 0 0 0 gptid/3794a11f-9fbe-11e7-964a-ac1f6b2067f0 ONLINE 0 0 0 gptid/384b3fed-9fbe-11e7-964a-ac1f6b2067f0 FAULTED 7 1 0 too many errors gptid/38fe5371-9fbe-11e7-964a-ac1f6b2067f0 ONLINE 0 0 0 gptid/39b445ef-9fbe-11e7-964a-ac1f6b2067f0 FAULTED 6 81 0 too many errors gptid/3a6df18f-9fbe-11e7-964a-ac1f6b2067f0 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 gptid/3b5af8ea-9fbe-11e7-964a-ac1f6b2067f0 ONLINE 0 0 0 gptid/3c221515-9fbe-11e7-964a-ac1f6b2067f0 ONLINE 0 0 0 gptid/3d147697-9fbe-11e7-964a-ac1f6b2067f0 ONLINE 0 0 0 gptid/3f147bb8-9fbe-11e7-964a-ac1f6b2067f0 ONLINE 0 0 0 gptid/41e7d938-9fbe-11e7-964a-ac1f6b2067f0 ONLINE 0 0 0 gptid/43896bdb-9fbe-11e7-964a-ac1f6b2067f0 ONLINE 0 0 0
...even with data corruption!
The disks that are faulting are SMART-OK:
Code:
~ # glabel status ... gptid/384b3fed-9fbe-11e7-964a-ac1f6b2067f0 N/A da2p2 ... gptid/39b445ef-9fbe-11e7-964a-ac1f6b2067f0 N/A da4p2 ...
Code:
~ # smartctl -a /dev/da2 ... === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED ... # 1 Short offline Completed without error 00% 24823 - ...
Now the strange thing:
First I thought it was a problem with the WD black, as they were in system 1 and at least 2 of them were constantly faulting on higher load. I ordered 2 new ones (I think WD Red Pro, don't know exactly) , replaced the 2 faulting disks (one after another), resilvered. During resilvering it became worse, with data errors...
I moved the config to the boot-HDDs and switched the boot-HDDs from system1 <-> system2. On system 2 (now logically system1) I switched the (formerly backup-) datasets to r/w and used this system as storage system. As you can see in the first example, it's now also faulting on higher load. There were no zpool errors when it was the backup system. Since it became the primary system and got some load there were faulting HDDs randomly in the pool, sometimes some 2TB drives, after I
zpool clear
there are some other drives faulting. Most of the time 2 disks are faulting. Also during resilver there are errors, probably because of high(er) load.Please let me know if you need additional data.
Thank you!
Bye!
Marco
Last edited by a moderator: