I am having big problem with ZFS/FreeNAS which may cause me to lose confidence, can anyone please help?
Here's the detail:
I have 5x 2TB drives for raidz1 with about 5TB data on it, as my data archive/backup, to where I regularly dump my data from a Synology NAS(less storage). I keep the FreeNAS offline most of the time, but when I powered it on last week I found one of the disks having bad sectors(from smartd). I upgraded FreeNAS from 9.1.0 to 9.3.1 without ZFS upgrade, everything looked fine, until I decided to replace the failing disk "ada3", when it was ONLINE and the pool zfs0 was still HEALTH.
I put ada3 offline(zfs became degraded), powered FreeNAS off, swapped a good disk in, powered it on, I could see the zpool for about two minutes, then suddenly the system hung and I lost connection from GUI and SSH, no response from the console. Then I reset the box but it couldn't boot(stuck with message "spa_load_impl waiting claims to sync", no luck with long waiting). I reinstalled FreeNAS 9.3.1 so that I could boot the system, and ran "zpool import" to see the pool, however, the "zpool import -f zfs0" failed(it ran for long time, even a day, with no single output from the console, all zfs command failed, I believe the zpool import command caused system hang). I tried FreeNAS 9.1.0, just wanted to make sure I used the right zfs version but no luck(even I didn't upgrade ZFS when I upgraded FreeNAS from 9.1.0 to 9.3.1).
I did a lot of google search, I found quite a few people complained about the "zpool import" took long time or caused system hang, but I saw no solution. I found I could zpool import the pool as readonly(WHY???) but at least I can see my data inside the pool(not sure if all the data are safe though).
[root@freenas] ~# zpool import
pool: zfs0
id: 756149107281189722
state: DEGRADED
status: One or more devices are offlined.
action: The pool can be imported despite missing or damaged devices. The
fault tolerance of the pool may be compromised if imported.
config:
zfs0 DEGRADED
raidz1-0 DEGRADED
gptid/31e69f5e-1523-11e3-b638-406186f225d7 ONLINE
gptid/323b141e-1523-11e3-b638-406186f225d7 ONLINE
gptid/32955024-1523-11e3-b638-406186f225d7 ONLINE
7438062148680404053 OFFLINE
gptid/334be691-1523-11e3-b638-406186f225d7 ONLINE
[root@freenas] ~# zpool import -f -o readonly=on zfs0 mnt
[root@freenas] ~#
[root@freenas] ~# zpool status -v
pool: mnt
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: resilvered 10.8M in 0h0m with 0 errors on Sun Jan 3 08:17:02 2016
config:
NAME STATE READ WRITE CKSUM
mnt DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
gptid/31e69f5e-1523-11e3-b638-406186f225d7 ONLINE 0 0 0
gptid/323b141e-1523-11e3-b638-406186f225d7 ONLINE 0 0 0
gptid/32955024-1523-11e3-b638-406186f225d7 ONLINE 0 0 0
7438062148680404053 OFFLINE 0 0 0 was /dev/gptid/32ea34ce-1523-11e3-b638-406186f225d7
gptid/334be691-1523-11e3-b638-406186f225d7 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
<metadata>:<0x95>
<metadata>:<0xb7>
I could copy out all the 5TB+ data out of the NAS, rebuild the NAS, then copy the data back. I have difficulty to find another 5TB+ free space to hold all these data and it will definitely take me days to transfer the data back and forth. Although I can go through all these troubles to get the NAS back working, but I wonder if I can just fix the ZFS/Zpool with an easier way... and most important, I want to understand why FreeNAS/ZFS could just fail like that and I really don't want to encounter the same problem in the future. I tried to be very careful when I picked ZFS years ago and when I did every single change to the system, I just didn't know what I could have done wrong. Also, why did I get those metadata error when all other disks are still good and a single failing disk just has a few bad sectors?? Why the pool can be imported as readonly but normal import failed? Without a good explanation for this issue, I wonder if I should go for another filesystem for NAS for the new build, like BTRFS.
PLEASE HELP. THANK YOU IN ADVANCE!
Here's the detail:
I have 5x 2TB drives for raidz1 with about 5TB data on it, as my data archive/backup, to where I regularly dump my data from a Synology NAS(less storage). I keep the FreeNAS offline most of the time, but when I powered it on last week I found one of the disks having bad sectors(from smartd). I upgraded FreeNAS from 9.1.0 to 9.3.1 without ZFS upgrade, everything looked fine, until I decided to replace the failing disk "ada3", when it was ONLINE and the pool zfs0 was still HEALTH.
I put ada3 offline(zfs became degraded), powered FreeNAS off, swapped a good disk in, powered it on, I could see the zpool for about two minutes, then suddenly the system hung and I lost connection from GUI and SSH, no response from the console. Then I reset the box but it couldn't boot(stuck with message "spa_load_impl waiting claims to sync", no luck with long waiting). I reinstalled FreeNAS 9.3.1 so that I could boot the system, and ran "zpool import" to see the pool, however, the "zpool import -f zfs0" failed(it ran for long time, even a day, with no single output from the console, all zfs command failed, I believe the zpool import command caused system hang). I tried FreeNAS 9.1.0, just wanted to make sure I used the right zfs version but no luck(even I didn't upgrade ZFS when I upgraded FreeNAS from 9.1.0 to 9.3.1).
I did a lot of google search, I found quite a few people complained about the "zpool import" took long time or caused system hang, but I saw no solution. I found I could zpool import the pool as readonly(WHY???) but at least I can see my data inside the pool(not sure if all the data are safe though).
[root@freenas] ~# zpool import
pool: zfs0
id: 756149107281189722
state: DEGRADED
status: One or more devices are offlined.
action: The pool can be imported despite missing or damaged devices. The
fault tolerance of the pool may be compromised if imported.
config:
zfs0 DEGRADED
raidz1-0 DEGRADED
gptid/31e69f5e-1523-11e3-b638-406186f225d7 ONLINE
gptid/323b141e-1523-11e3-b638-406186f225d7 ONLINE
gptid/32955024-1523-11e3-b638-406186f225d7 ONLINE
7438062148680404053 OFFLINE
gptid/334be691-1523-11e3-b638-406186f225d7 ONLINE
[root@freenas] ~# zpool import -f -o readonly=on zfs0 mnt
[root@freenas] ~#
[root@freenas] ~# zpool status -v
pool: mnt
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: resilvered 10.8M in 0h0m with 0 errors on Sun Jan 3 08:17:02 2016
config:
NAME STATE READ WRITE CKSUM
mnt DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
gptid/31e69f5e-1523-11e3-b638-406186f225d7 ONLINE 0 0 0
gptid/323b141e-1523-11e3-b638-406186f225d7 ONLINE 0 0 0
gptid/32955024-1523-11e3-b638-406186f225d7 ONLINE 0 0 0
7438062148680404053 OFFLINE 0 0 0 was /dev/gptid/32ea34ce-1523-11e3-b638-406186f225d7
gptid/334be691-1523-11e3-b638-406186f225d7 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
<metadata>:<0x95>
<metadata>:<0xb7>
I could copy out all the 5TB+ data out of the NAS, rebuild the NAS, then copy the data back. I have difficulty to find another 5TB+ free space to hold all these data and it will definitely take me days to transfer the data back and forth. Although I can go through all these troubles to get the NAS back working, but I wonder if I can just fix the ZFS/Zpool with an easier way... and most important, I want to understand why FreeNAS/ZFS could just fail like that and I really don't want to encounter the same problem in the future. I tried to be very careful when I picked ZFS years ago and when I did every single change to the system, I just didn't know what I could have done wrong. Also, why did I get those metadata error when all other disks are still good and a single failing disk just has a few bad sectors?? Why the pool can be imported as readonly but normal import failed? Without a good explanation for this issue, I wonder if I should go for another filesystem for NAS for the new build, like BTRFS.
PLEASE HELP. THANK YOU IN ADVANCE!