SOLVED The Message ID: ZFS-8000-8A indicates corrupted data exists in the current pool

Status
Not open for further replies.

larencio88

Dabbler
Joined
May 5, 2016
Messages
13
Thank you. ;) The phenomenon is torn writes with write amplification. Simply put, if I write one 512 byte sector to an AF disk, 8 sectors are at risk of loss if that write fails. Same if a 4K write is unaligned, except 16 sectors at risk. Another scenario is a drive that automatically claims completion of such writes to avoid exposing the actual latency, in the expectation that more will follow to the same track. There are three makes in this pool.

Ashift should be viewable by zdb -e tank | grep ashift



I'm not sure that is a real ZFS I/O error. In any case, even if there was an I/O error, the theory is there should be no corruption to the pool. But if there was an I/O error and corruption, then there is probably a problem in the code, even if the window of exposure is merely 'during a shutdown'.
Interesting, I ran the command and here is the output.
Code:
root@freenas ~]# zdb -e tank | grep ashift   
  ashift: 12   
  ashift: 12   
  ashift: 12   
  ashift: 12   
  ashift: 12   
  ashift: 12   
  ashift: 12   
  ashift: 12   
loading space map for vdev 3 of 4, metaslab 183 of 349 ...   
36.2M completed (  0MB/s) estimated time remaining: 11239hr 16min 27sec


the first 2 vdev's are on the onboard lsi2308 controller, the third vdev is 2/3 on there and 1/3 on the intel motherboard controller. It seems to be only showing the 8 disks on the lsi2308 controller so far.
 

larencio88

Dabbler
Joined
May 5, 2016
Messages
13
Update, I am currently running an RSYNC task, copying the data off to another server. The cpu load was running around 30%, now has increased to 80% while this zdb task continues to run. Disk load increased and the da7 disk has begun throwing off errors again. Maybe this will push it over the edge and I can then replace it.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
You should Ctrl-C out of the zdb task. All you need to see is the ashift.

In this case, your ashift is correct for Advanced Format drives, so an ashift problem would not be an issue here.

Have you tried zpool clear tank to reset the error list, then run a scrub to see if the same 'permanent error' shows up?
 

larencio88

Dabbler
Joined
May 5, 2016
Messages
13
You should Ctrl-C out of the zdb task. All you need to see is the ashift.

In this case, your ashift is correct for Advanced Format drives, so an ashift problem would not be an issue here.

Have you tried zpool clear tank to reset the error list, then run a scrub to see if the same 'permanent error' shows up?
When I came back to check on it, the task was killed.
Code:
Feb 23 17:19:25 freenas swap_pager_getswapspace(16): failed
Feb 23 17:19:25 freenas swap_pager_getswapspace(12): failed
Feb 23 17:19:25 freenas swap_pager_getswapspace(16): failed
Feb 23 17:19:25 freenas swap_pager_getswapspace(12): failed
Feb 23 17:19:25 freenas swap_pager_getswapspace(16): failed
Feb 23 17:19:25 freenas swap_pager_getswapspace(12): failed
Feb 23 17:19:25 freenas swap_pager_getswapspace(9): failed
Feb 23 17:19:25 freenas swap_pager_getswapspace(3): failed
Feb 23 17:19:25 freenas swap_pager_getswapspace(3): failed
Feb 23 17:19:25 freenas swap_pager_getswapspace(3): failed
Feb 23 17:19:25 freenas swap_pager_getswapspace(3): failed
Feb 23 17:19:25 freenas swap_pager_getswapspace(3): failed
Feb 23 17:19:25 freenas kernel: pid 83641 (zdb), uid 0, was killed: out of swap space
Feb 23 17:19:25 freenas kernel: pid 83641 (zdb), uid 0, was killed: out of swap space


I am going to let the RSYNC task finish and then try your suggestions. It may take over 24 hours to finish the transfer. i will update the thread then. Thanks for the help
 

larencio88

Dabbler
Joined
May 5, 2016
Messages
13
Success! After running a scrub of the pool, the error has gone away.
Code:
scan: scrub repaired 0 in 5h59m with 0 errors on Mon Feb 27 01:24:53 2017 

Code:
errors: No known data errors

The RSYNC transfer took much longer than I expected, about 3 days for 8TB, but I thought It would be better to have another copy of the PLEX media if something went wrong. The disk errors on DA7 have stopped also, after a reboot, even under the load of the Scrub.
 
Status
Not open for further replies.
Top