Warning: replacing ancient drive with new one keeps bad ashift

scwst · Oct 8, 2016

This is from the "duh" department, but to let others learn from my mistakes, here's what I didn't think of with my current, "evolutionary" build.

tl;dr: When replacing mirrored ancient drives (default ashift=9) in place with a pair of new drives (default ashift=12), the new drives are given the old ashift, making ZFS unhappy, and you'll have to destroy and rebuild the pool, which will make you unhappy.

I started testing my first setup with two old 500 GB drives from 2007/2008 in a mirrored configuration, so if something went wrong, it would be no problem. That worked just fine, so I added two other ancient 640 GB WD Caviar Blue (2009/2010) in a mirrored configuration to the same pool. Yay, still no problems, so then I got real: Added a third vdev out of two mirrored 3 TB GB WD Red (2015/2016) to the pool. And verily, all was well.

So I copy a bunch of stuff over from my old Synology, but not too much, as I don't want to completely fill the 3 TB drives yet. The rest goes to external hard drives (twice). I then proceed to take the Synology apart, which gives me 2 TB WD Red (both 2013), one 3 TB WD Red and one 1 TB WD Red (yes, that was a stupid setup, no, I didn't know what I was doing at the time).

Now, next step is easy, right? Replace the ancient 500 GB drives in place one-by-one with the two 2 TB WD Reds, which keeps redundancy. Takes a bit longer than I had been led to expect (three hours), but whatever, works fine. My pool has expanded!

But wait, zpool status tank is really unhappy now about the new mirrored vdev:

One or more devices are configured to use a non-native block size. Expect reduced performance.

and

block size: 512B configured, 4096B native

in the listing for the two 2 TB drives. Huh? This makes no sense to me at first, because diskinfo -v /dev/ada0 and friends gives me the exact same sectorsize/stripesize combination for the 3 TB as the 2 TB drives. After fighting the horrible documentation on zdb for a bit, I figure out I need to use zdb -U /data/zfs/zpool.cache | grep ashift, which gives me this for the three mirrored vdevs:

ashift: 9
ashift: 9
ashift: 12

Oops. The first one are the old 640 GB drives, which are 512 sectors pure and simple. Their ashift=9 is legit. The last one are the new 3 TB drives, which are 512/4096, so when creating their vdev, they were set to ashift=12. Correct. And the ones in the middle are the 2 TB WD Reds, which should be ashift=12 as well, but the step-by-step replacement forced ashift=9 on them. Further research shows that you can't replace whole vdevs in place, and the only real solution is to destroy the pool and start over. Well, Scheibenkleister.

All rather logical once you think of it, but I didn't. The good news is that my data survived all of this intact, the pool still works, and is fast enough for what I need it to do. At some point, I'll get some more drives, back everything up externally (twice), and then do it over, this time with new drives only.

nojohnny101 · Oct 8, 2016

Thanks for posting. Good information and good investigation.

I did not know that about migrating from older drives with varying ashift values. Would have tripped me up for sure.

Stux · Oct 9, 2016

scwst said:
This is from the "duh" department, but to let others learn from my mistakes, here's what I didn't think of with my current, "evolutionary" build.

tl;dr: When replacing mirrored ancient drives (default ashift=9) in place with a pair of new drives (default ashift=12), the new drives are given the old ashift, making ZFS unhappy, and you'll have to destroy and rebuild the pool, which will make you unhappy.

I started testing my first setup with two old 500 GB drives from 2007/2008 in a mirrored configuration, so if something went wrong, it would be no problem. That worked just fine, so I added two other ancient 640 GB WD Caviar Blue (2009/2010) in a mirrored configuration to the same pool. Yay, still no problems, so then I got real: Added a third vdev out of two mirrored 3 TB GB WD Red (2015/2016) to the pool. And verily, all was well.

So I copy a bunch of stuff over from my old Synology, but not too much, as I don't want to completely fill the 3 TB drives yet. The rest goes to external hard drives (twice). I then proceed to take the Synology apart, which gives me 2 TB WD Red (both 2013), one 3 TB WD Red and one 1 TB WD Red (yes, that was a stupid setup, no, I didn't know what I was doing at the time).

Now, next step is easy, right? Replace the ancient 500 GB drives in place one-by-one with the two 2 TB WD Reds, which keeps redundancy. Takes a bit longer than I had been led to expect (three hours), but whatever, works fine. My pool has expanded!

But wait, zpool status tank is really unhappy now about the new mirrored vdev:

One or more devices are configured to use a non-native block size. Expect reduced performance.

and

block size: 512B configured, 4096B native

in the listing for the two 2 TB drives. Huh? This makes no sense to me at first, because diskinfo -v /dev/ada0 and friends gives me the exact same sectorsize/stripesize combination for the 3 TB as the 2 TB drives. After fighting the horrible documentation on zdb for a bit, I figure out I need to use zdb -U /data/zfs/zpool.cache | grep ashift, which gives me this for the three mirrored vdevs:

ashift: 9
ashift: 9
ashift: 12

Oops. The first one are the old 640 GB drives, which are 512 sectors pure and simple. Their ashift=9 is legit. The last one are the new 3 TB drives, which are 512/4096, so when creating their vdev, they were set to ashift=12. Correct. And the ones in the middle are the 2 TB WD Reds, which should be ashift=12 as well, but the step-by-step replacement forced ashift=9 on them. Further research shows that you can't replace whole vdevs in place, and the only real solution is to destroy the pool and start over. Well, Scheibenkleister.

All rather logical once you think of it, but I didn't. The good news is that my data survived all of this intact, the pool still works, and is fast enough for what I need it to do. At some point, I'll get some more drives, back everything up externally (twice), and then do it over, this time with new drives only.

I've done this in the past, but I checked the pools first before dancing to check they were already ashift 12. Luckily they were ;)

Well. I guess I didn't do it, but I was aware of the problem :)

Thankyou, and hopefully others will be now too!

scwst · Oct 15, 2016

Remember the suggestion by Fred Brooks to "throw one away"? Now, I can't really recommend completely rebuilding your NAS like I just had to do, because backing up terabytes of data (twice) and then restoring it takes a really, really long time and is generally a pain in the rear. But it does give you time to think and gain some insights. Like, why a SLOG wouldn't help, how to set up the datasets best, and maybe I should join data hoarders anonymous?

Anyway, the new machine is far better for the experience, with specialized record sizes, gzip-9 on the "cold storage" dataset, and of course the correct ashift now. All works fine now.

rs225 · Oct 16, 2016

gzip is best done with the gzip program. It is too slow to embed in the file system.

scwst · Oct 16, 2016

Usually true, but this is just data that I want to keep someplace safe from bitrot and probably not access for years. Encoding it took forever, but now it's as small as it can be. Everything else is either the default compression or (in the case of the media dataset) has compression turned off.

Dice · Oct 17, 2016

thanks for posting.
I'd consider this for a resource post as this is definitely useful information in the light of many contributors boasting the idea of 'replacement of drives by size to achieve growth'. @Ericloewe - do you think it merits?

I wonder if this would apply to RAIDZ2 also?
Since I've been test running a setup with 1-3TB sized drives in a mismatch "junk-pool". Never did I investigate this deep on potential underlying problems, just assuming that growth would be free from problems in case desired.

edit: A slight worry is that upon searching the user guide for "ashift" - nothing is returned.

Ericloewe · Oct 17, 2016

Dice said:
I'd consider this for a resource post as this is definitely useful information in the light of many contributors boasting the idea of 'replacement of drives by size to achieve growth'. @Ericloewe - do you think it merits?

Sure, the more the merrier. The mod team will remove anything horrible and all users can like and review resources, so the best ones will hopefully float to the top.

TheWasher · Dec 22, 2016

I read this thread and I have just done the same and migrated from 4 x mirrored smaller drives whilst playing with FreeNAS to a 4 x 2Tb array. zpool status shows block size: 512B configured, 4096B native. I have destroyed the pool and then started over, however, diskinfo -v /dev/ada0 still shows 512B. What do I need to do to change block size to 4096B or is this just something that I am going to have to live with? I have 2 x 2TB WD and 2 x 2TB Seagate drives in my pool.
Many thanks.

Robert Trevellyan · Dec 22, 2016

TheWasher said:
diskinfo -v /dev/ada0 still shows 512B

The question is, what is ashift value is ZFS using?

TheWasher · Dec 23, 2016

Many thanks for the response. Unfortunately one of the Seagate drives has failed (was a new drive that i bought and never used - not happy as now out of warranty) and I have taken the drive out of the case and will start from scratch now. I will order a new WD Red, might even stretch to 2 x WD Reds to replace the other Seagate in the array and then rebuild. I'll post once I have this done.

Merry Xmas!

Important Announcement for the TrueNAS Community.

Warning: replacing ancient drive with new one keeps bad ashift

scwst

Explorer

nojohnny101

Wizard

Stux

MVP

scwst

Explorer

rs225

Guru

scwst

Explorer

Dice

Wizard

Ericloewe

Server Wrangler

TheWasher

Cadet

Robert Trevellyan

Pony Wrangler

TheWasher

Cadet

Similar threads