Warning: replacing ancient drive with new one keeps bad ashift

Status
Not open for further replies.

scwst

Explorer
Joined
Sep 23, 2016
Messages
59
This is from the "duh" department, but to let others learn from my mistakes, here's what I didn't think of with my current, "evolutionary" build.

tl;dr: When replacing mirrored ancient drives (default ashift=9) in place with a pair of new drives (default ashift=12), the new drives are given the old ashift, making ZFS unhappy, and you'll have to destroy and rebuild the pool, which will make you unhappy.

I started testing my first setup with two old 500 GB drives from 2007/2008 in a mirrored configuration, so if something went wrong, it would be no problem. That worked just fine, so I added two other ancient 640 GB WD Caviar Blue (2009/2010) in a mirrored configuration to the same pool. Yay, still no problems, so then I got real: Added a third vdev out of two mirrored 3 TB GB WD Red (2015/2016) to the pool. And verily, all was well.

So I copy a bunch of stuff over from my old Synology, but not too much, as I don't want to completely fill the 3 TB drives yet. The rest goes to external hard drives (twice). I then proceed to take the Synology apart, which gives me 2 TB WD Red (both 2013), one 3 TB WD Red and one 1 TB WD Red (yes, that was a stupid setup, no, I didn't know what I was doing at the time).

Now, next step is easy, right? Replace the ancient 500 GB drives in place one-by-one with the two 2 TB WD Reds, which keeps redundancy. Takes a bit longer than I had been led to expect (three hours), but whatever, works fine. My pool has expanded!

But wait, zpool status tank is really unhappy now about the new mirrored vdev:

One or more devices are configured to use a non-native block size. Expect reduced performance.

and

block size: 512B configured, 4096B native

in the listing for the two 2 TB drives. Huh? This makes no sense to me at first, because diskinfo -v /dev/ada0 and friends gives me the exact same sectorsize/stripesize combination for the 3 TB as the 2 TB drives. After fighting the horrible documentation on zdb for a bit, I figure out I need to use zdb -U /data/zfs/zpool.cache | grep ashift, which gives me this for the three mirrored vdevs:

ashift: 9
ashift: 9
ashift: 12


Oops. The first one are the old 640 GB drives, which are 512 sectors pure and simple. Their ashift=9 is legit. The last one are the new 3 TB drives, which are 512/4096, so when creating their vdev, they were set to ashift=12. Correct. And the ones in the middle are the 2 TB WD Reds, which should be ashift=12 as well, but the step-by-step replacement forced ashift=9 on them. Further research shows that you can't replace whole vdevs in place, and the only real solution is to destroy the pool and start over. Well, Scheibenkleister.

All rather logical once you think of it, but I didn't. The good news is that my data survived all of this intact, the pool still works, and is fast enough for what I need it to do. At some point, I'll get some more drives, back everything up externally (twice), and then do it over, this time with new drives only.
 

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,478
Thanks for posting. Good information and good investigation.

I did not know that about migrating from older drives with varying ashift values. Would have tripped me up for sure.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
This is from the "duh" department, but to let others learn from my mistakes, here's what I didn't think of with my current, "evolutionary" build.

tl;dr: When replacing mirrored ancient drives (default ashift=9) in place with a pair of new drives (default ashift=12), the new drives are given the old ashift, making ZFS unhappy, and you'll have to destroy and rebuild the pool, which will make you unhappy.

I started testing my first setup with two old 500 GB drives from 2007/2008 in a mirrored configuration, so if something went wrong, it would be no problem. That worked just fine, so I added two other ancient 640 GB WD Caviar Blue (2009/2010) in a mirrored configuration to the same pool. Yay, still no problems, so then I got real: Added a third vdev out of two mirrored 3 TB GB WD Red (2015/2016) to the pool. And verily, all was well.

So I copy a bunch of stuff over from my old Synology, but not too much, as I don't want to completely fill the 3 TB drives yet. The rest goes to external hard drives (twice). I then proceed to take the Synology apart, which gives me 2 TB WD Red (both 2013), one 3 TB WD Red and one 1 TB WD Red (yes, that was a stupid setup, no, I didn't know what I was doing at the time).

Now, next step is easy, right? Replace the ancient 500 GB drives in place one-by-one with the two 2 TB WD Reds, which keeps redundancy. Takes a bit longer than I had been led to expect (three hours), but whatever, works fine. My pool has expanded!

But wait, zpool status tank is really unhappy now about the new mirrored vdev:

One or more devices are configured to use a non-native block size. Expect reduced performance.

and

block size: 512B configured, 4096B native

in the listing for the two 2 TB drives. Huh? This makes no sense to me at first, because diskinfo -v /dev/ada0 and friends gives me the exact same sectorsize/stripesize combination for the 3 TB as the 2 TB drives. After fighting the horrible documentation on zdb for a bit, I figure out I need to use zdb -U /data/zfs/zpool.cache | grep ashift, which gives me this for the three mirrored vdevs:

ashift: 9
ashift: 9
ashift: 12


Oops. The first one are the old 640 GB drives, which are 512 sectors pure and simple. Their ashift=9 is legit. The last one are the new 3 TB drives, which are 512/4096, so when creating their vdev, they were set to ashift=12. Correct. And the ones in the middle are the 2 TB WD Reds, which should be ashift=12 as well, but the step-by-step replacement forced ashift=9 on them. Further research shows that you can't replace whole vdevs in place, and the only real solution is to destroy the pool and start over. Well, Scheibenkleister.

All rather logical once you think of it, but I didn't. The good news is that my data survived all of this intact, the pool still works, and is fast enough for what I need it to do. At some point, I'll get some more drives, back everything up externally (twice), and then do it over, this time with new drives only.

I've done this in the past, but I checked the pools first before dancing to check they were already ashift 12. Luckily they were ;)

Well. I guess I didn't do it, but I was aware of the problem :)

Thankyou, and hopefully others will be now too!
 
Last edited:

scwst

Explorer
Joined
Sep 23, 2016
Messages
59
Remember the suggestion by Fred Brooks to "throw one away"? Now, I can't really recommend completely rebuilding your NAS like I just had to do, because backing up terabytes of data (twice) and then restoring it takes a really, really long time and is generally a pain in the rear. But it does give you time to think and gain some insights. Like, why a SLOG wouldn't help, how to set up the datasets best, and maybe I should join data hoarders anonymous?

Anyway, the new machine is far better for the experience, with specialized record sizes, gzip-9 on the "cold storage" dataset, and of course the correct ashift now. All works fine now.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
gzip is best done with the gzip program. It is too slow to embed in the file system.
 

scwst

Explorer
Joined
Sep 23, 2016
Messages
59
Usually true, but this is just data that I want to keep someplace safe from bitrot and probably not access for years. Encoding it took forever, but now it's as small as it can be. Everything else is either the default compression or (in the case of the media dataset) has compression turned off.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
thanks for posting.
I'd consider this for a resource post as this is definitely useful information in the light of many contributors boasting the idea of 'replacement of drives by size to achieve growth'. @Ericloewe - do you think it merits?

I wonder if this would apply to RAIDZ2 also?
Since I've been test running a setup with 1-3TB sized drives in a mismatch "junk-pool". Never did I investigate this deep on potential underlying problems, just assuming that growth would be free from problems in case desired.

edit: A slight worry is that upon searching the user guide for "ashift" - nothing is returned.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'd consider this for a resource post as this is definitely useful information in the light of many contributors boasting the idea of 'replacement of drives by size to achieve growth'. @Ericloewe - do you think it merits?
Sure, the more the merrier. The mod team will remove anything horrible and all users can like and review resources, so the best ones will hopefully float to the top.
 

TheWasher

Cadet
Joined
Dec 22, 2016
Messages
2
I read this thread and I have just done the same and migrated from 4 x mirrored smaller drives whilst playing with FreeNAS to a 4 x 2Tb array. zpool status shows block size: 512B configured, 4096B native. I have destroyed the pool and then started over, however, diskinfo -v /dev/ada0 still shows 512B. What do I need to do to change block size to 4096B or is this just something that I am going to have to live with? I have 2 x 2TB WD and 2 x 2TB Seagate drives in my pool.
Many thanks.
 

TheWasher

Cadet
Joined
Dec 22, 2016
Messages
2
Many thanks for the response. Unfortunately one of the Seagate drives has failed (was a new drive that i bought and never used - not happy as now out of warranty) and I have taken the drive out of the case and will start from scratch now. I will order a new WD Red, might even stretch to 2 x WD Reds to replace the other Seagate in the array and then rebuild. I'll post once I have this done.

Merry Xmas!
 
Status
Not open for further replies.
Top