Seagate IronWolf 10TB (ST10000VN0004) vs LSI IT firmware controllers

Rajan

Cadet
Joined
Feb 10, 2021
Messages
2
I've been looking for the right place to post this and I believe this is it.

To be brief, I've been working on a YouTube series where I'm building a 100TB ZFS based server. Not all Enterprise grade hardware (it's for at home) and not using FreeBSD/FreeNAS (I have some different requirements) but I still hope this is the right place.

The Problem:
Since finishing the server and bringing my 8x Seagate Ironwolf 10TB pool online I've been seeing odd behavior. I have 2x LSI SAS2008 based IT firmware cards in my system and each card has 4x IronWolf 10TB disks attached. While copying my data I would occasionally see a write error on one of the disks which the LSI card would reset and it would continue happily like nothing happened.

Later, during more testing I would see a very sparse but still happening read, write or CRC error on the pool. I tried rebuilding the pool, new cables, etc.. but everything was good and should work perfectly. Although a disk would sometimes give an error the disk itself had no recollection of this in SMART, only ZFS would mention something went wrong and "dmesg" would also fill up with errors.

Finally a solution?:
Now the point of this topic, digging deeper it turns out Seagate released a firmware update for the ST10000VN0004 and ST10000NE0004 last month. They bumped from firmware SC60 to SC61 and in that topic it's stated that this is because of "flush cache timing out bug that was discovered during routine testing" in regards to Synology systems.

As it turns out, write cache (and I believe internally NCQ) had been turned off for these specific drives in Synology systems for a while now because of "stability" issues. Since this firmware update it gets turned on again and all is well.

That got me thinking, if a Synology is having this issue, maybe this was more disk firmware related then anything else. So, since I still have all my data on other drivers anyway I went ahead and flashed all my 8x ST10000VN0004 from SC60 to SC61. This worked without a problem and even a ZFS Scrub found no issues with the data still on there.

But... I was able to finish a scrub twice of 20TB on the pool now without a single read, write or CRC error. I've been hitting the drive with TBs of DD and Bonnie++ and not a single error anymore. So this might actually be a fix for topics like this one I found and this one.

Needs more testing:
I want to bring out a video on this issue and the potential fix for it (and help people how to do it) but I kind of need it better tested to be sure this fixes the issue.

Who here is still having this issue and is willing to test the firmware on their drives and report back? I have no clue about potentially voiding your warranty or anything like that, only that it's maybe a fix the issues these drives have been having!

Firmware:
The original Synology topic I found this information

Firmware for IronWolf 10TB ST10000VN0004
Firmware for IronWolf Pro 10TB ST10000NE0004

--update
Video and article released!

Still i am unable to access my NAS
 

Rajan

Cadet
Joined
Feb 10, 2021
Messages
2
I have a QNAP TS-x53B Series with 10TB x 5 IronWolf Bios Ver Qy59AR09 (4.4.2(20200519) One of my HDD is showing fail when i try to access the NAS it is throwing the following error .How can i retrieve me old data ?Need you advice.
 

Attachments

  • Capture.JPG
    Capture.JPG
    69.5 KB · Views: 208

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,924
Hello Rajan. This forum is for TrueNAS/FreeNAS software, not QNAP devices. Perhaps you might have more luck with answers on the QNAP Forum at https://forum.qnap.com/

Good luck with getting to your data.
 
Top