Help with a broken zpool due to upgrading firmware on drives.

Status
Not open for further replies.

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
Hi.

I need some help restoring my zpool. I have 8 3tb seagate sata (ST3000DM001) drives in a raidz1 pool. Yea, I know raidz2 would have been better. Too late now.

I noticed they were not all running the latest firmware. 2 of them had CC4H, the latest, and 6 had CC4C.

I shutdown and booted up a dos usb flash disk for flashing.

Just before shutting down, I noticed a "adaX device lost". I thought it was one of my esata drives that I was playing with. This turned out to be one of the 3tb drives. The drive seems dead, it's not recognized by the bios of two different pc's. Different cables don't help. Different drives on the same pc's / controllers work fine. So I'm pretty sure I have a bad drive. I didn't know this going in to the flashing.

I should have flashed one, then booted freenas back up to make sure everything was ok. But because I already had a bad drive I didn't know about, this wouldn't have helped.

I flashed 5 of the 6 drives to CC4H. Trying to flash the 6th drive is when I realized I had a bad drive. No big deal, the pool will come up degraded.

Booted freenas back up, and 5 drives report invalid or corrupt GPT. Only 2 of the 3tb drives show up. I've never had a drive firmware update cause data loss before, so I was a bit surprised.

I booted a gparted live distro to investigate. I found the problem. The 5 drives originally had a total size of 5,860,533,134 sectors (512byte). And the last partition ended at 5,860,533,127. However, with CC4H, the drives are reporting their size as 5,856,034,936. Somehow the drives shrunk by 4,498,198 sectors. What's weird is that the 2 drives that came with CC4H, report their size 'correctly', as the CC4C ones did before the update.

I tried looking around for a way to downgrade back to CC4C, but can't find a way to do it. I've opened a support ticket with seagate, but I'm not sure they'll be much help. My question is, if I 'fix' the partition table to match the new smaller drive size, will they still work in ZFS? Or will drives with different partition table sizes confuse it?

If I can get the 5 drives working that were broken by the firmware size thing, I can resilver the pool back onto a good 8th drive. I've got about 6tb of movies / tv shows on the pool. I can always redownload everything, but it would be a major pain. I hope there's a way to salvage the pool.

Thanks for any help.
 

leenux_tux

Patron
Joined
Sep 3, 2011
Messages
238
titan_rw,

Just so happens I was reading up on some ZFS stuff earlier on when I saw your message...

Not sure if this will help as you didn't state if you could do a "zpool status" from the command line, however, it might be worth a look. Plus this command is for Solaris ZFS so may not be available on FreeNAS (I'm at work at the moment so don't have access to my FreeNAS system to confirm)

Take a look at the following URL entitled "Repairing ZFS Storage Pool-Wide Damage".

http://docs.oracle.com/cd/E19253-01/819-5461/6n7ht6r7v/index.html#indexterm-714

leenux_tux
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
"zpool status" shows no pools.

Makes sense since the gpt is being rejected as invalid.

I guess the only thing I can do is shrink the partition to the new reported drive size, and pray zfs can deal with that.

Seagate confirmed there is no way to flash back to the old firmware.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
You should use FreeBSD 8.3 based system or higher to attempt the recovery. The updated ZFS code is much better at disaster recovery and I have a feeling you are going to need it.

I guess the only thing I can do is shrink the partition to the new reported drive size, and pray zfs can deal with that.
I would hold off on shrinking anything if you haven't already done so. Both ZFS & GPT can deal with missing labels at the end of the drive, well partition which is at the end of the drive. Though you may need to fix the GPT partition before you can do anything else.

I just did the math and if the drives are now ≈ 2GB smaller that's a significant problem. ZFS prefers to use the outer areas of the disk first as well as they are faster.

From a SSH session as root paste the output of:
Code:
zpool import

camcontrol devlist

gpart show

glabel status
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
You should use FreeBSD 8.3 based system or higher to attempt the recovery. The updated ZFS code is much better at disaster recovery and I have a feeling you are going to need it.

I would hold off on shrinking anything if you haven't already done so. Both ZFS & GPT can deal with missing labels at the end of the drive, well partition which is at the end of the drive. Though you may need to fix the GPT partition before you can do anything else.


Thanks for the reply.


Short version:

I messed with it for about 6h today and got the pool online, albeit in a degraded state. Sucking data off it now.



Long version:

This was as strange one. At first I tried shrinking the partitions (adaXp2, etc). Using gdisk booting off of a gparted live distro, I deleted the second partition, then recreated it within the new limits of the drive. I did the first one, then rebooted into freenas. Instead of 5 drives complaining about invalid gpt, there was just 4. Good I thought, so I booted back into gparted, and continued. The last one I went to resize threw me for a loop though. Instead of being only 2gig short, the last drive was showing as only 1.8tb in size, instead of ~2.7. At this point I knew something was wrong, so I tried booting seagate's seatools. It also said native drive capacity was ~1.8tb. I tried the 'set max native capacity' command, but it wasn't working. It was just coming up with a manual size entry dialog. Even entering the correct max capacity, it still wouldn't continue.

I took the 5 problem drives out of that computer, and one at a time, powered them up in an old Athlon 3500 machine. Seatools initially said the same thing, max capacity was below what it should be. But this time, when I did the 'set max native' command, it said completed. I power cycled the computer, and ran seatools again. This time it showed the correct max capacity. I did this for all the drives in turn, hopefully restoring their max lba.

After setting the capacity back, I went back into gparted and set their partition tables all back to the way they were. Put them all back into the freenas machine. Powered it up, and did a zpool import. This time showed my pool, but with 6/8 drives. I was expecting at least one to be dead, as I still had the one that even the bios wouldn't recognize. Found the bad one with 'gpart list adaX'. Booted back into gparted, and gdisk was still showing the drive was about 100 sectors smaller than the partition. Tried seatools set max capacity again multiple times. It was still showing 100 sectors small. So I simply deleted the partition and created it within boundaries.

Tried again, and 7/8 drives were showing up. Great. Brought the pool online and mounted it. But one of the 7 drives was constantly head resetting. A click / bleep kinda thing. It took about 10 minutes to get a recursive directory listing. At this point I had 2 bad drives. The one that wasn't showing up in the bios I figured was a probable bad drive pcb. The one that was head resetting I thought might be a spindle problem. So I swapped boards between the two drives. The 'drive' that was previously not showing up in the bios, this time did show in the bios, but now it had a reset problem. The one that previously had the reset problem still had a reset problem, only it was worse now. Neither drive was useful with the pcb's switched around. So I changed them back. Now, the drive that originally wasn't showing in the bios still had a reset problem. Have no idea why. However, the drive that originally had a reset problem is working. It bleep'd 3-4 times when I first powered it up, but that was it. Brought the pool back online and mounted with 6 good drives, and 1 questionable one.

So I'm backing up the data now. Then I'll rma the bad drives, and recreate the pool as a raidz2. Having two drive redundancy previously would have made this recovery process a whole lot simpler.


I just did the math and if the drives are now ≈ 2GB smaller that's a significant problem. ZFS prefers to use the outer areas of the disk first as well as they are faster.

That's weird. As far as I was aware, the 'start' of a drive was the outer edge. The end of the drive was the inner edge. I remember graphing hd read speeds over the surface of the disk and seeing the drop in performance as you got further to the inside (end) of the disk. ZFS 'should' use the outer areas first, but this is actually the beginning of each disk, not the end. Having to shrink the partitions a bit at the end shouldn't matter too much, if the drives weren't all that full to being with. I was only about 1/3 full on mine. Plus in the end I ended up having to shrink one of the partitions only about 100 sectors.


Even though I have this solved (for now), I'll list my system specs, which I guess I should have done at the start.

FreeNAS-8.3.0-beta3-x64.
i5-3570k
Asus P8Z77-V LK
32 gb ram.
Integrated realtek nic
offboard intel CT nic

8x seagate 3tb sata as primary pool.
misc 1tb and 2tb drives for other pools.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
At this point I knew something was wrong, so I tried booting seagate's seatools. It also said native drive capacity was ~1.8tb. I tried the 'set max native capacity' command, but it wasn't working.
I've read that the BIOS should be set to IDE mode when using SeaTools and other similar utilities.

That's weird. As far as I was aware, the 'start' of a drive was the outer edge. The end of the drive was the inner edge. I remember graphing hd read speeds over the surface of the disk and seeing the drop in performance as you got further to the inside (end) of the disk. ZFS 'should' use the outer areas first, but this is actually the beginning of each disk, not the end.
Yes, of course you are right and ZFS does. A little brain fart on my part. ZFS likes to be spatially diverse in storing metadata & such. So, I don't think it's surprising that ZFS was using areas near the end of the disk.

So I'm backing up the data now. Then I'll rma the bad drives, and recreate the pool as a raidz2. Having two drive redundancy previously would have made this recovery process a whole lot simpler.
A smart move with 3TB disks. While you were particularly unlucky, the only single-parity array I would consider with 3TB drives are mirrors. An 8 drive raidz array is rather wide as well. Unrecoverable read errors are fairly common on these larger sized drives. All of which means you can easily lose some data when replacing a failed drive and any of the 7 remaining fails to read some sectors. Double-parity arrays make such a scenario much more unlikely.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
A smart move with 3TB disks. While you were particularly unlucky, the only single-parity array I would consider with 3TB drives are mirrors. An 8 drive raidz array is rather wide as well. Unrecoverable read errors are fairly common on these larger sized drives. All of which means you can easily lose some data when replacing a failed drive and any of the 7 remaining fails to read some sectors. Double-parity arrays make such a scenario much more unlikely.

I've got most of the data off. Some of the files must have been in the area that I had to shrink. During a local rsync, I get "read errors mapping /mnt/path/to/file". I assume this is zfs telling me there's unreadable portions of the files in question. The files are multimedia files, .avi, and .mov. I'd like to be able to recover as much of them as it can read, instead of nothing at all. How can you 'skip errors' and read only what is readily available? I don't imagine there's much corruption, but it seems it won't copy any of the file if there's any problems.

Thanks.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
I assume this is zfs telling me there's unreadable portions of the files in question. The files are multimedia files, .avi, and .mov. I'd like to be able to recover as much of them as it can read, instead of nothing at all. How can you 'skip errors' and read only what is readily available? I don't imagine there's much corruption, but it seems it won't copy any of the file if there's any problems.
If your assumption is correct then dd is your friend. See: Holy smokes! A holey file!, dd tricks for holey files & more on holey files. At the very least I imagine those files would be considered corrupt until you repaired them with something.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
If your assumption is correct then dd is your friend. See: Holy smokes! A holey file!, dd tricks for holey files & more on holey files. At the very least I imagine those files would be considered corrupt until you repaired them with something.


Thanks. I'll look at those.

I managed to get the 8th drive added back into the pool. It's head resetting a lot, but you can read from it, it's just slow. I thought the extra parity would allow me to recover these files.

"zpool status -v thepool" shows the problematic files. I thought it would be able to work around this since it has one drive redundancy. Surely it can read from another drive for the small partition shrink I had to do.

I did notice zfs checksums were turned off. I think I did that when I was trying in vain to get the pool online the first time.

I tried "zfs set checksums=on thepool", but it never returns. The command 'hangs'. The reset of the system is responsive, but I can't cancel the zfs command. Even a "kill PID" doesn't close it.

The zpool status -v shows checksum errors on one of the drives though. So maybe it is doing checksums? Can I offline the drive with errors and try with the pool in degraded?


Thanks.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
I managed to get the 8th drive added back into the pool. It's head resetting a lot, but you can read from it, it's just slow. I thought the extra parity would allow me to recover these files.
Quite possibly you may be able to get more of the files if you can actually read from the 8th drive. Expect I/O timeouts and lots of I/O retries if it does work. A misbehaving disk, but one still accepting commands is often worse than a missing/dead one.

I thought it would be able to work around this since it has one drive redundancy. Surely it can read from another drive for the small partition shrink I had to do.
And if the other drives also stored the redundancy for the files at the end of the partition?

The zpool status -v shows checksum errors on one of the drives though. So maybe it is doing checksums? Can I offline the drive with errors and try with the pool in degraded?
ZFS always checksums metadata and copies and you can certainly try at this point.
 
Status
Not open for further replies.
Top