Resilvering Very Slowwwwwww

Status
Not open for further replies.

TrevInCarlton

Dabbler
Joined
Sep 18, 2018
Messages
39
I have no idea how long it should take to resilver a pool of 6 x 8tb seagate drives. I know when I had a Synology Diskstation it would take about a day, Now I did think that after spending a lot of money on my new Freenas things like this would be quicker. After a scrub it was clear that I had a failing drive so I replaced it last night but it soon became clear from the reporting graphs that it was going to take a very long time. The good drives are currently being read at a mere 12mb/s and the same speed for writing to the new drive. This morning I did wonder if there was anything wrong so did a reboot. For about a minute things went fast (see screenshots) but then dropped back to 12mb/s. I also noticed that the resilver % dropped back from about 54% to 43%. Earlier this afternoon I shut down the system and removed my other 2 pools and all associated HBA's, and Network cards but on start up the slow speed persists and also once again the resilver % dropped back from about 61% to 43%.

Non of the disks or system in general is under any great stress apart from the one being rebuilt which is showing 100% busy and the pending i/o requests for that drive appear to be maxed out (da21)

We are constantly told that resilvering puts a lot of stress on the rest of the disks in the pool but with each disk currently showing 12% busy and 12mb/s read speed there doesn't appear to be any danger of another drive failing.

If this is the way Freenas works then it is something I will have to live with but something just does not feel right especially when a scrub can rattle along at over 130mb/s for each disk.
 

Attachments

  • Screen Shot 2018-10-23 at 08.51.45.png
    Screen Shot 2018-10-23 at 08.51.45.png
    58 KB · Views: 336
  • Screen Shot 2018-10-23 at 08.51.28.png
    Screen Shot 2018-10-23 at 08.51.28.png
    69.5 KB · Views: 336
  • Screen Shot 2018-10-23 at 08.49.46.png
    Screen Shot 2018-10-23 at 08.49.46.png
    73.3 KB · Views: 349
  • Screen Shot 2018-10-23 at 08.49.16.png
    Screen Shot 2018-10-23 at 08.49.16.png
    78.8 KB · Views: 314

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
pool of 6 x 8tb seagate drives
What model drive makes a difference. You should list that.
It also depends on how much data is in the pool. I resilver a drive in about 6 hours on my system at home, but I have systems at work that have taken much longer.
I know when I had a Synology Diskstation
No comparison because the underlying software is completely different.
After a scrub it was clear that I had a failing drive
That is part of the problem but the other part is you only built your pool as RAIDz1. The guides recommend against that for this very reason.
Please read this material:

Why not to use RAID-5 or RAIDz1
https://www.zdnet.com/article/why-raid-5-stops-working-in-2009/

Slideshow explaining VDev, zpool, ZIL and L2ARC
https://forums.freenas.org/index.ph...ning-vdev-zpool-zil-and-l2arc-for-noobs.7775/

You should be using RAIDz2 or better.
I replaced it last night but it soon became clear from the reporting graphs that it was going to take a very long time.
You have not said, but looking at the graphs, I am guessing that da21 is the failing drive. It is taking a very long time to respond to IO requests. That is going to very significantly impact the time to resilver the pool. It would probably be MUCH faster to just remove that drive and rebuild parity from the other three drives. The problem with that is it leaves you with no parity (no protection) while the resilver is happening. That is why we don't build pools with RAIDz1. You need to be able to remove the failed drive and still have a parity drive.
The good drives are currently being read at a mere 12mb/s and the same speed for writing to the new drive.
That is being held back by the bad drive. The pool can only read at the rate of the slowest drive and it can't write to the replacement drive until the read is done. As long as that defective drive is still in the pool, the system will continue to try to read from it.
This morning I did wonder if there was anything wrong so did a reboot.
Never reboot during a rebuild unless you just don't care at all about your data. That is pretty crazy.
For about a minute things went fast (see screenshots) but then dropped back to 12mb/s.
Because it was trying to recover to where it was before the reboot and part of that was reading the data on the new drive.
I also noticed that the resilver % dropped back from about 54% to 43%.
Because you lost any incomplete actions from before the reboot and it is probably having to do it over. You just made it take longer.
Earlier this afternoon I shut down the system and removed my other 2 pools and all associated HBA's, and Network cards but on start up the slow speed persists and also once again the resilver % dropped back from about 61% to 43%.
Again? Why? You are creating your own pain. "Dock, it hurts when I poke myself in the eye"... Don't poke yourself in the eye...
Non of the disks or system in general is under any great stress apart from the one being rebuilt which is showing 100% busy and the pending i/o requests for that drive appear to be maxed out (da21)
Finally, you tell us it is da21 that is defective, which I guessed from the graph, but that is what is holding the system back, just that one bad disk.
We are constantly told that resilvering puts a lot of stress on the rest of the disks in the pool
That is a bad data element, I don't even know where that idea came from. The disk that normally works hard during a resilver is the new drive because it is writing data as fast as it can be shoved into the drive, but the other drives are usually working just a bit less.
If this is the way Freenas works then it is something I will have to live with
No, it isn't because of FreeNAS, the problem in this pool is that defective drive slowing everything down.
but something just does not feel right especially when a scrub can rattle along at over 130mb/s for each disk.
That is how fast this would probably be going if not for, I said it so many times now, that defective drive. I run RAIDz2 and when I have a bad drive, I remove it and then resilver onto a replacement drive. The resilver runs much faster with the bad drive out of the way. I have done a resilver with the bad drive still in the pool, trust me, this is the voice of experience. I have replaced hundreds of drives. Literally.
 

TrevInCarlton

Dabbler
Joined
Sep 18, 2018
Messages
39
What model drive makes a difference. You should list that.
It also depends on how much data is in the pool. I resilver a drive in about 6 hours on my system at home, but I have systems at work that have taken much longer.

No comparison because the underlying software is completely different.

That is part of the problem but the other part is you only built your pool as RAIDz1. The guides recommend against that for this very reason.
Please read this material:

Why not to use RAID-5 or RAIDz1
https://www.zdnet.com/article/why-raid-5-stops-working-in-2009/

Slideshow explaining VDev, zpool, ZIL and L2ARC
https://forums.freenas.org/index.ph...ning-vdev-zpool-zil-and-l2arc-for-noobs.7775/

You should be using RAIDz2 or better.

You have not said, but looking at the graphs, I am guessing that da21 is the failing drive. It is taking a very long time to respond to IO requests. That is going to very significantly impact the time to resilver the pool. It would probably be MUCH faster to just remove that drive and rebuild parity from the other three drives. The problem with that is it leaves you with no parity (no protection) while the resilver is happening. That is why we don't build pools with RAIDz1. You need to be able to remove the failed drive and still have a parity drive.

That is being held back by the bad drive. The pool can only read at the rate of the slowest drive and it can't write to the replacement drive until the read is done. As long as that defective drive is still in the pool, the system will continue to try to read from it.

Never reboot during a rebuild unless you just don't care at all about your data. That is pretty crazy.

Because it was trying to recover to where it was before the reboot and part of that was reading the data on the new drive.

Because you lost any incomplete actions from before the reboot and it is probably having to do it over. You just made it take longer.

Again? Why? You are creating your own pain. "Dock, it hurts when I poke myself in the eye"... Don't poke yourself in the eye...

Finally, you tell us it is da21 that is defective, which I guessed from the graph, but that is what is holding the system back, just that one bad disk.

That is a bad data element, I don't even know where that idea came from. The disk that normally works hard during a resilver is the new drive because it is writing data as fast as it can be shoved into the drive, but the other drives are usually working just a bit less.

No, it isn't because of FreeNAS, the problem in this pool is that defective drive slowing everything down.

That is how fast this would probably be going if not for, I said it so many times now, that defective drive. I run RAIDz2 and when I have a bad drive, I remove it and then resilver onto a replacement drive. The resilver runs much faster with the bad drive out of the way. I have done a resilver with the bad drive still in the pool, trust me, this is the voice of experience. I have replaced hundreds of drives. Literally.

1 - The drives are new Seagate Baracuda that perform well other than this resilver issue.
2 - Not that much data in the pool. I was in the process of transferring from backups so maybe no more than 10% full.
3 - Fully appreciate that Synology is much different but with only 4gb of ram and an inadequate CPU I thought my new Freenas build should perform Better.
4 - ! also fully appreciate and accept the dangers of Raidz1 but it is difficult getting the drives in the space although I have considdered Raidz2 for the 2 x 8 drive pools.
5 - Drive da21 is actually the new replacement drive.
6 - The data is not important which is why I tried a reboot.

7 AS I SAID - DRIVE DA21 IS A BRAND NEW DRIVE WHICH HAS PASSED A FULL SMART TEST RUN ON MY MAC.

Think I will either leave it to finish resilvering or take the drive out and run another SMART test on it from my mac
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Seagate Baracuda
Exactly what model number? It makes a difference or I wouldn't ask.
I thought my new Freenas build should perform Better.
It should.
7 AS I SAID - DRIVE DA21 IS A BRAND NEW DRIVE WHICH HAS PASSED A FULL SMART TEST RUN ON MY MAC.
It matters about the drive model number, Seagate makes some drives that are using SMR (Shingled Magnetic Recording) and I found that using them in my NAS significantly slowed the operation of the pool. I disposed of the ones that I had purchased. That could be the problem here. We need a model number.
It could, alternatively, be that the new drive is defective or that reading from the old drive is slowing everything down. Either of those is still valid.
A new drive should receive a 'burn-in' test before being added to a pool.
Look here:

Github repository for FreeNAS scripts, including disk burnin
https://forums.freenas.org/index.ph...for-freenas-scripts-including-disk-burnin.28/
 

TrevInCarlton

Dabbler
Joined
Sep 18, 2018
Messages
39
Exactly what model number? It makes a difference or I wouldn't ask.

It should.

It matters about the drive model number, Seagate makes some drives that are using SMR (Shingled Magnetic Recording) and I found that using them in my NAS significantly slowed the operation of the pool. I disposed of the ones that I had purchased. That could be the problem here. We need a model number.
It could, alternatively, be that the new drive is defective or that reading from the old drive is slowing everything down. Either of those is still valid.
A new drive should receive a 'burn-in' test before being added to a pool.
Look here:

Github repository for FreeNAS scripts, including disk burnin
https://forums.freenas.org/index.ph...for-freenas-scripts-including-disk-burnin.28/

Hi Chris, I am aware of the SMR drives and would never use them in a NAS. I have already stated that the drives are Baracuda. These drives do not use SMR, Seagate SMR drives are called "Archive". My 6tb drives have a model number ST8000DM004, the SMR version as the part No. ST80000AS0002.

I decided to pull the plug and currently running full SMART test using DriveDX on my Mac so until that is finished we won't really know if I have yet another failing drive or it is in fact something happening inside Freenas, or maybe something else in my build ??? As the remaining 5 drives in the pool are all reading at 12mb/s I doubt that they are all defective especially as they all performed at over 130mb/s during a scrub. I will take note on the burnin though.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
model number ST8000DM004
Sorry, that is one of the models where Seagate got tricky. They are SMR drives even though they don't stipulate to it in the documentation.
or it is in fact something happening inside Freenas
It is in fact not a problem with FreeNAS. I have a RAIDz1 pool that I use for backups and it should rebuild at just under the maximum write rate of the target drive. It is a problem with your drives. There is no doubt about that.
 

TrevInCarlton

Dabbler
Joined
Sep 18, 2018
Messages
39
Sorry, that is one of the models where Seagate got tricky. They are SMR drives even though they don't stipulate to it in the documentation.

It is in fact not a problem with FreeNAS. I have a RAIDz1 pool that I use for backups and it should rebuild at just under the maximum write rate of the target drive. It is a problem with your drives. There is no doubt about that.
Well that really is total nonsense. I have just read the seagate documentation for the drive which also covers the 4tb model ST4000M004 which I have around 16 of. Are you saying that ALL those drives are also SMR ? According to seagate the recording system used is TMGR, NOT SMR. So please explain how it is I can write to the pool before the problem appeared at up to 800mb/s ? if in fact ALL of the drives are as you claim SMR. THE MODEL NUMBERS ARE TOTALLY DIFFERENT FOR THE SMR DRIVES. Or, are you also telling me that Seagate are selling SMR drives without stating it ? If you are then please back up your claims so I can start a case against Seagate under the UK Trade Descriptions Act
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Well that really is total nonsense. I have just read the seagate documentation for the drive which also covers the 4tb model ST4000M004 which I have around 16 of. Are you saying that ALL those drives are also SMR ? According to seagate the recording system used is TMGR, NOT SMR. So please explain how it is I can write to the pool before the problem appeared at up to 800mb/s ? if in fact ALL of the drives are as you claim SMR. THE MODEL NUMBERS ARE TOTALLY DIFFERENT FOR THE SMR DRIVES. Or, are you also telling me that Seagate are selling SMR drives without stating it ? If you are then please back up your claims so I can start a case against Seagate under the UK Trade Descriptions Act
Look at this auction: https://www.ebay.com/itm/253941809869
That is the last of the drives I bought and tried to use that I am selling after less than a month of use. I don't care what the documentation says and I have read it. I know how the drives behaved in my system and I have a pile of Seagate drives (other models) that I have used and still use. This is as big a disappointment to me as it is to you.
 
Joined
May 10, 2017
Messages
838
Without a doubt ST8000DM004 and lower capacities related models are all SMR, and for some reason they appear to perform worse than the older generation SMR archive drives, and have a much lower workload rating.
 

TrevInCarlton

Dabbler
Joined
Sep 18, 2018
Messages
39
Look at this auction: https://www.ebay.com/itm/253941809869
That is the last of the drives I bought and tried to use that I am selling after less than a month of use. I don't care what the documentation says and I have read it. I know how the drives behaved in my system and I have a pile of Seagate drives (other models) that I have used and still use. This is as big a disappointment to me as it is to you.

Well, after a lot of research and a phone call to Seagate who confirmed that unless stated the drives are SMR and not PMR. To say that this is a big disappointment to me is a real understatement. I actually feel gutted, and somewhat cheated by Seagate. Not only do I have the 6 x 8tb for one of the volumes I also brought 3 x 4tb ST4000DM004s to upgrade the 5 drive pool from my Synology to a 8 drive pool in the Freenas. It doesn't end there, I also brought another 3 x 4tb to replace 3 WD green drives in the 3rd 8x4tb pool. My usage is simple and write once and read many so I am going to live with the 8tb drives. SMR is fine until you need to delete and use the space again. So providing I build the volume from the ground up I should be OK. Clearly these drives are useless when it comes to resilvering which explains the pending 1/o graph. I simply do not have the finances to buy yet more new drives but will think about replacing one of the 8x4 pools soon. That will at least give me plenty of backup capacity. This situation is far from perfect but I have tried hard to stick to the requirements on hardware but I can't do anything about the drives at this point.

I am currently running full drive verification tests on 6 of the 8tb Baracudas to establish if any more need returning to Seagate
 

TrevInCarlton

Dabbler
Joined
Sep 18, 2018
Messages
39
Yes, it's a shame Seagate appears to try and hide that info for some models, I've been warning users about theses drives since last year, at the time most didn't believe but came to accept I was right.

You're quite right, last nite there was no way I was going to accept the fact but today I see things in a different light. It should be clearly stated. Do any other manufacturer use the SMR technology or is it just Seagate ? If so do they keep it quiet just like Seagate ? STILL FEELING GUTTED !!!!
 
Joined
May 10, 2017
Messages
838
Do any other manufacturer use the SMR technology or is it just Seagate ?

AFAIK for 3.5" drives and besides Seagate there's the the HGST HS14 14TB (now called Ultrastar DC HC620) but it's clearly identified as host managed SMR, for 2.5" drives all drives currently using 1TB platters from Seagate and WD are SMR, for these Seagate does identify most as SMR, though some only in latter manual revision, e.g. see this post, I believe WD doesn't mention SMR anywhere, Toshiba currently doesn't have any SMR drives, either 2.5" or 3.5".

EDIT: I've learned since that Toshiba L200 1 and 2TB 2.5" drives are also SMR, they can easily be recognized by the 128MB cache size.
 
Last edited:

TrevInCarlton

Dabbler
Joined
Sep 18, 2018
Messages
39
AFAIK for 3.5" drives and besides Seagate there's the the HGST HS14 14TB (now called Ultrastar DC HC620) but it's clearly identified as host managed SMR, for 2.5" drives all drives currently using 1TB platters from Seagate and WD are SMR, for these Seagate does identify most as SMR, though some only in latter manual revision, e.g. see this post, I believe WD doesn't mention SMR anywhere, Toshiba currently doesn't have any SMR drives, either 2.5" or 3.5".

Thanks for the info. I can understand the use of SMR drives for archiving but to put one in a laptop seams a bit crazy with data constantly changing. The performance must really suffer moving SMR blocks around.

Just had a slight lift, I can return 3 of the drives back to Amazon so that will help to soften the blow. Ironwolfs or WD reds from now on.
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
Thanks for the info. I can understand the use of SMR drives for archiving but to put one in a laptop seams a bit crazy with data constantly changing. The performance must really suffer moving SMR blocks around.

Just had a slight lift, I can return 3 of the drives back to Amazon so that will help to soften the blow. Ironwolfs or WD reds from now on.
either, I use both and are nothing but happy. Due to price most drives are now ironwolfs
 

TrevInCarlton

Dabbler
Joined
Sep 18, 2018
Messages
39
either, I use both and are nothing but happy. Due to price most drives are now ironwolfs

Yes, its looking very much like Ironwolfs
 
Joined
May 10, 2017
Messages
838
The performance must really suffer moving SMR blocks around.

They have some measures to avoid hitting the SMR wall, like multi-tier cache, where first there's some DRAM, then some FLASH, then some PMR and finally the main SMR layer, they fill confident "regular" users wont notice since I've seen notebooks from at least Asus and Lenovo come with 1TB Seagate SMR drives from factory, and I guess most users won't be doing large amounts of randoms writes maybe they don't notice, anyone currently buying a laptop without an SSD doesn't really care about disk i/o performance anyway ;)
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
https://www.bhphotovideo.com/c/product/1182644-REG/crucial_ct4k32g4rfd424a_128
so I can start a case against Seagate under the UK Trade Descriptions Act
If there is some legal recourse to force them to be truthful and clear in the product documentation, it would be good for us all. I didn't get burned as bad as you, but I bought three of the shingled drives and lost a decent amount of cash because I turned around and sold them again as they were no good to me.
I replaced them with this model which appears to work well:
https://www.newegg.com/Product/Product.aspx?Item=N82E16822178338
I don't think the ST4000DM000 is shingled and I know they are working well for me. I am currently running 24 of them between my two FreeNAS servers.
 

TrevInCarlton

Dabbler
Joined
Sep 18, 2018
Messages
39
They have some measures to avoid hitting the SMR wall, like multi-tier cache, where first there's some DRAM, then some FLASH, then some PMR and finally the main SMR layer, they fill confident "regular" users won't notice since I've seen notebooks from at least Asus and Lenovo come with 1TB Seagate SMR drives from factory, and I guess most users won't be doing large amounts of randoms writes maybe they don't notice, anyone currently buying a laptop without an SSD doesn't really care about disk i/o performance anyway ;)

Thanks for the info, helps me understand a bit better on how these all work. Good point about SSDs, I have replaced the drives in my 2 Macs and a Mac mini with Samsung EVO's and Crucial MX5000, also have a slightly older Sandisk in Win10 machine. They are now fast !!!!! I don't think we are too many years away from the end of spinning disks, then we will start to worry whether SSDs will fail or not ?
 

TrevInCarlton

Dabbler
Joined
Sep 18, 2018
Messages
39
https://www.bhphotovideo.com/c/product/1182644-REG/crucial_ct4k32g4rfd424a_128
If there is some legal recourse to force them to be truthful and clear in the product documentation, it would be good for us all. I didn't get burned as bad as you, but I bought three of the shingled drives and lost a decent amount of cash because I turned around and sold them again as they were no good to me.
I replaced them with this model which appears to work well:
https://www.newegg.com/Product/Product.aspx?Item=N82E16822178338
I don't think the ST4000DM000 is shingled and I know they are working well for me. I am currently running 24 of them between my two FreeNAS servers.

Yes, I really have been burnt but at least 3 will be going back to Amazon. Judging by your part number they are the PRO version (ending in 000 and not 00 ?) Sad to say the new Baracuda 4tb models are also SMR, they are much thinner and lighter. Valuable lessons learnt.
 
Status
Not open for further replies.
Top