RAIDZ2 Resilvering crashes and is extremely slow

Gen8 Runner · Feb 1, 2018

Hey everyone,
at first, here my FreeNAS Configuration:
- RaidZ2, encrypted
- 7 4TB HDD (Seagate Desktop), to be replaced by 7 8TB IronWolf (Standard, not Pro)
- Supermicro X9SCL-F
- Xeon E3 1220V2
- IBM M1015 Raid-Controller, flashed for IT Mode
- 32GB ECC Ram
- FreeNAS-11.1-U1

Now the problem:
I wanted to increase the size of my RAIDZ2 Dataset, by changing drive by drive by a bigger one.
For this reason, i followed the guideline in the FreeNAS Documentation, unfortunately (my silliness), i thought i have no time and a RaidZ2 Configuration, I set 2 4TB drives offline and replace them by the Seagate IronWolf 8TB.

Everything so far went fine and it was resilvering already for about 4 Days to ~80%. Then, without any warning, FreeNAS rebooted and i had to start the resilvering process again (it started around 40% progress, for whatever reason) and at ~ 55% FreeNAS crashed again and rebooted.

Consequently resilvering started again by itself, but this time at 0%. The first 20% were very fast (around 4 hours), the next 5% already now needed 8 Hours. CPU is almost idle, same for the HDDs.

What is best way, to solve this problem? Can I get somewhere logs, where I will find, why it crashed and why it becomes so slow?
Is it smart, to remove one of the 2 8TB Seagate Drives, to resilver at first only one pool?

On the screenshot is the zpool status to see, with its current progress. Until now I had NEVER a problem with any of the HDDs, as well as with unstable FreeNAS System (everything worked always perfectly fine).

Thanks already for your help.

Bidule0hm · Feb 1, 2018

Well, more than 114 millions data errors (never saw that many errors until now...), this pool is dead, I hope you have a backup.

Sounds like multiple drives are starting to die or maybe the PSU is dying or not enough.

Can you post the output (between code tags) of smartctl -a /dev/daX (X being the device number) for all your drives please?

As you've probably already figured out it wasn't a good idea to replace two disks at the same time because any error in the other disks would mean loss of data as no redundancy was here to correct them. @All whatever your RAID-Z level is, keep at least one disk of redundancy when you do that, even if it means it'll take more time to expand the pool.

rs225 · Feb 1, 2018

I would guess a controller problem, or the PSU can not provide enough power.

You could try putting the 4TB drives back in, see if the pool operates, and then re-online the 4TB drives. Once redundancy is restored, it should be possible to eradicate the 8TB drives. If that doesn't work, then your controller or some other system component is clearly defective.

Gen8 Runner · Feb 1, 2018

Hey both,
thank you already so far for your information. I will unplug the other pool (have a second pool of 7 drives), to ensure, that the PSU is giving enough power.
Later i will send the smart-information.

I still have the former 4TB drives here, but how should i bring them online in the pool and decrypt them with which key? On replacing the 2 4TB disks, the geli key also changed. Which key i would now need to use, to try, to get the former pool alive?

rs225 · Feb 1, 2018

I would try reducing the power load by removing the other pool. That is much less risky.

Gen8 Runner · Feb 2, 2018

rs225 said:
I would try reducing the power load by removing the other pool. That is much less risky.

I did so. Remove the second Dataset from the computer, to ensure, that the HDD's get enough power.
But it's weird, something seems to be here completely wrong:
- HDD after HDD get's degraded
- Freenas shows now, that the pool consists of 8 HDD's. But that pool never consisted of 8 HDD's (they're even physically not connected, for this reason it is impossible)

I now also connected all the HDD's to the Onboard-SATA Controller (6 of 7, have only 6 connectors on the Mainboard), to ensure, that the IBM Controller is not part of the problem.

rs225 · Feb 2, 2018

How is the power load distributed? Are there multiple rails to the PSU, or is it all on one or two?

Gen8 Runner · Feb 2, 2018

Power is delievered by two rails. For the small Desktop & Red HDD`s it was always sufficient.
But now, with the Ironwolf, looks like it is too less. I didn`t calculate yet the power on full workload needed & provided by the PSU.
When the Ironwolf was connected, it was impossible, to start other HDD`s (they just stayed switched off)

I now, after looking almost 24 hours for a solution, found something, that seems to work, to save my data:
- The second RaidZ2 Pool is made active again (removed the IronWolf, plugged in the HDD and imported the dataset like always)
- The old pool is in degraded state, but yet working with 5 HDD`s (i needed 500 times to change disk by disk and check, when freenas is accepting a working combination of the HDD`s)

But that was a hard childbirth...always freenas crashed for the panic loop, when i wanted to mount the encrypted pool, for this reason, i had to use a trick, to swap freenas:
- On the Web-Gui chose import dataset and selected all the HDD`s of the broken pool (in my case 6, one drive still shown as removed)
- Provided the GELI Key & Password, go on next
- BUT: In Step 3, where it shows you in the dropdown-menu "Mount as ShareXYZ" DO NOT make a click on "Finish", otherwise the panic-loop starts and freenas reboots and you start again

- Just leave this step open and don`t click nothing
- Open a SSH Session (or IPMI, like here) and enter "zpool import -f -o readonly=on YOURPOOLNAME"
- After doing this, press F5 in the WebGui (just reload the page) and you can check with zpool status and zpool list, if everything is now mounted in the background

The pool is not visible in the "Dataset" Tab, so you can just use the cp command, to copy your data to another dataset / HDD.
That is at the moment working quite smooth. Copying -despite of encryption- with 21 GB/ minute. :)
I just hope, that the HDD`s stay alive, that is my last chance.

Important Announcement for the TrueNAS Community.

RAIDZ2 Resilvering crashes and is extremely slow

Gen8 Runner

Contributor

Attachments

Bidule0hm

Server Electronics Sorcerer

rs225

Guru

Gen8 Runner

Contributor

rs225

Guru

Gen8 Runner

Contributor

Attachments

rs225

Guru

Gen8 Runner

Contributor

Similar threads