L2ARC io error problems

n1ko · Aug 27, 2013

I'm seeing IO errors and Bad checksums in arc_summary.pl on my l2arc device. The errors start showing up after hours of usage, usually 100+ GB of populated data at that point. Another weird thing is that the size is reported wrong, eg. over my SSD size. Might be due to compression, but I dont have anything that has been compressed over 1.3x.

Hardware:

Generic build with AMD 5800k, 32GB of NON-ecc RAM (tested multiple times with memtest), Kingston V300 60GB SSD.

L2 ARC Summary: (DEGRADED)
Passed Headroom: 13.06m
Tried Lock Failures: 146.61k
IO In Progress: 465
Low Memory Aborts: 20
Free on Write: 6.55k
Writes While Full: 19.11k
R/W Clashes: 26
Bad Checksums: 11.12k
IO Errors: 4.35k
SPA Mismatch: 337.01m

pool: tank
state: ONLINE
scan: scrub repaired 0 in 27h41m with 0 errors on Mon Aug 26 02:21:53 2013
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/38d32808-f0a3-11e2-bf20-0017087e7ad4 ONLINE 0 0 0
gptid/39766b13-f0a3-11e2-bf20-0017087e7ad4 ONLINE 0 0 0
gptid/3a2e1011-f0a3-11e2-bf20-0017087e7ad4 ONLINE 0 0 0
gptid/3af0e894-f0a3-11e2-bf20-0017087e7ad4 ONLINE 0 0 0
gptid/3bb4c993-f0a3-11e2-bf20-0017087e7ad4 ONLINE 0 0 0
gptid/3c8c502e-f0a3-11e2-bf20-0017087e7ad4 ONLINE 0 0 0
gptid/3d69c1ad-f0a3-11e2-bf20-0017087e7ad4 ONLINE 0 0 0
gptid/3e4bb0f4-f0a3-11e2-bf20-0017087e7ad4 ONLINE 0 0 0
gptid/3f26a89a-f0a3-11e2-bf20-0017087e7ad4 ONLINE 0 0 0
gptid/400252f1-f0a3-11e2-bf20-0017087e7ad4 ONLINE 0 0 0
cache
gptid/18550d61-0dae-11e3-a860-0017087e7ad4 ONLINE 0 0 0

Things I have tried out:

- Changing SSD (OCZ Vertex 2 -> Kingston V300)
- Changing SATA -cables
- Changing PSU-cable
- Changing SATA-controller
- Assigning the device to a another pool

I have no problems running memtest86 for over 24h and the server runs fine otherwise. I haven't seen any errors reported in zpool status and scrubs have been error free.

cyberjock · Aug 27, 2013

You do have me a little stumped. When you say "changed SATA controller" can you specify what you were using and what you changed to? If onboard, did you have AHCI enabled?

You might want to try a totally different PSU. I've never seen an SSD have issues with voltage before lots of other problems had already made the issue evident, but it doesn't hurt to try it.

What motherboard in particular did you use?

Did you try underprovisioning the cache? For example, only partitioning 58GB of the 60GB SSD.

n1ko · Aug 27, 2013

Thanks for the quick reply! I'm baffled too..

The mobo is a Asus F2A85-V PRO. The SATA-controllers I tried were the onboard controller on the mobo (hudson-something-something), and a sil 3132. AHCI is enabled on the onboard controller, don't know about the SiL. Both are not ideal, but then again the other pool i'm running is working fine on these (12 disk mirrror). I also have 2x IBM m1015 and the PSU is a 600W Corsair.

Underprovisioning seems like worth a try, since the errors seem to start when the disk size is reached (reported by arc_summary). Might be a coincidence of course, and like I said i'm not sure hot to interpret the size it reports... I will report back about this when I have tried this out.

cyberjock · Aug 27, 2013

I'd try using one of your M1015s. Those are definitely well supported while those others are less so. One thing I'm not a fan of with AMD motherboards is the lack of a well supported SATA controller. Intel controllers are pretty good just for SATA use for desktops and smaller servers. But the M1015 is where its at. I have that in 2 of my FreeNAS servers.

eraser · Jul 1, 2014

Are you still having this problem after upgrading to 9.2.1.4 or later?

Important Announcement for the TrueNAS Community.

L2ARC io error problems

n1ko

Dabbler

cyberjock

Inactive Account

n1ko

Dabbler

cyberjock

Inactive Account

eraser

Contributor

Similar threads