Strange & moving I/O Error - Seagate Ironwolf Drives

IdefixRC

Cadet
Joined
Aug 6, 2023
Messages
5
Hello everyone,

I have a very strange IO error on my Truenas scale system.
Note: the system runs virtualized on ESXi 6.7 with a SAS controller on passthrough (yes i know this is not ideal).

I run one Pool with 2 VDEVs.
VDEV 1: 4x 4TB Drives
VDEV 2: 4x 8TB Drives
Both are running RaidZ1

Initially, VDEV2 had 3 Seagate Ironwolf and 1 WD drive.
I started receiving IO errors (always around 248 errors) on one of the Ironwolf drives some weeks into the operation of the NAS.
Swapping cables and moving the disk to a different backplane did not solve the issue. So I replaced it with a Toshiba N300 8TB drive.
As soon as this was done the nextIronwolf threw a similar IO error, again, swapping cables or bays did not make a difference. The error remained with the drive.
Similarly, I replaced the Ironwolf with a N300 8TB and.....bang, the next Ironwolf immediately showed the same issue.
VDEV2 now has no Ironwolf left and the IO error is gone.

BUT.....vDEV1 has one 4TB Ironwolf and as soon as the last Ironwolf was replaced from VDEV2, this drive throw the same error with around 248 errors detected.

The Ironwolfes are not old, some of the ones were replaced with less than 3 months of operation and no new FW could be found (Understand there was some issue in the past with some of them).

Does anyone have an idea what the issue could be?
Memory? (but why only the Ironwolfes are impacted?)
Issues with the SAS controller/MB/CPU? (The SAS controller-HEC310 in IT mode- ran fine with these drives in the old windows server - MB and CPU are new though)

Thanks a lot !
 

IdefixRC

Cadet
Joined
Aug 6, 2023
Messages
5
Update on the issue:
It looks like the 1U PSU I used for the NAS was not up to the task. While it is rated for 600W and more than capable of handling the power required, the fact that it has a single SATA/IDE output port/cable only means that all 8 HDD plus 2 SDD ran off that one single IDE/SATA power cable and that might have overpowered this particular port/cable or resulted in a fault on the port which in turn caused voltage drops.

After replacing the Iron Wolf drives which threw IO errors with Toshiba drives, the Toshiba drives kept randomly disconnecting and reconnecting - one drive a day roughly likely caused by a difference in how these drives handled low/dropped power vs the Iron wolf drives.

I replaced the 1U PSU with a standard - now external - ATX PSU of similar wattage and the issue seems to be gone.
 
Top