shan81
Dabbler
- Joined
- Oct 11, 2011
- Messages
- 21
Hi Everybody,
I recently built a new Freenas system which is throwing out random errors when sustained copying / scrubbing is performed on the pool.
The specs of the Freenas build are:
12 X 4TB hard-drives, two pools of Raidz2 with six drives per pool. As shown below.
C2750D4I Motherboard
Transcend 32 GB DDR3-1866 ECC unbuffered DIMM (Running at 1600Mhz)
550W Cooler master power-supply
Before I put the server into production, I performed a burn-in of the RAM using memtest; which it passed.
The backup of my data wasn't stored at my location and I had to relocated Freenas to copy my backup. After powering up the newly built Freenas in my backup location, I noticed that the zpool status was showing as degraded. At the time, I though that this was due to one of the SATA cables coming loose. I powered off the server, checked all the cabling, powered on the server, cleared the zpool error and started copying all the files from my backup to Freenas.
After a couple of days of copying, I returned the server back to it's original location and powered it on again. I was greeted by a zpool degraded message again and an error message in my console stating that ada8 was showing chksum errors in the zpool so again I checked all the cabling, cleared the zpool error and then ran a scrub of the pool. This caused the drive in question (ada8) to disconnect from the raid and no longer show up in the BIOS.
Following the Freenas guide, I shut-down Freenas, removed the faulty drive and installed the spare. Using the GUI I replaced the faulty drive and after a few hours, the replacement drive was successfully resilvered into the pool.
After the faulty drive happened, I ran a long smartctl scan of all drives attached to the system, the results can be seen at the link below:
http://pastebin.com/vBc2GyDy
Since then, I have had more strange errors when copying or scrubbing my volume, such as:
Ada10 seems to be the drive that I receive CRC errors from. According to other Freenas forum posts could indicate a faulty SATA cable.
What about these ahcich timeout errors messages? Are these related to smartd? I ask because I can see the error message below displayed on my console and not on dmesg, indicating that:
Any help would be most appreciated!
I recently built a new Freenas system which is throwing out random errors when sustained copying / scrubbing is performed on the pool.
The specs of the Freenas build are:
12 X 4TB hard-drives, two pools of Raidz2 with six drives per pool. As shown below.
Code:
ada0-\ ada1---| ada2---| ada3---| ada4---|--RAIDZ2\ ada5--/ \ [freenas] ada6--\ / ada7---|--RAIDZ2/ ada8---| ada9---| ada10- | ada11-/
C2750D4I Motherboard
Transcend 32 GB DDR3-1866 ECC unbuffered DIMM (Running at 1600Mhz)
550W Cooler master power-supply
Before I put the server into production, I performed a burn-in of the RAM using memtest; which it passed.
The backup of my data wasn't stored at my location and I had to relocated Freenas to copy my backup. After powering up the newly built Freenas in my backup location, I noticed that the zpool status was showing as degraded. At the time, I though that this was due to one of the SATA cables coming loose. I powered off the server, checked all the cabling, powered on the server, cleared the zpool error and started copying all the files from my backup to Freenas.
After a couple of days of copying, I returned the server back to it's original location and powered it on again. I was greeted by a zpool degraded message again and an error message in my console stating that ada8 was showing chksum errors in the zpool so again I checked all the cabling, cleared the zpool error and then ran a scrub of the pool. This caused the drive in question (ada8) to disconnect from the raid and no longer show up in the BIOS.
Following the Freenas guide, I shut-down Freenas, removed the faulty drive and installed the spare. Using the GUI I replaced the faulty drive and after a few hours, the replacement drive was successfully resilvered into the pool.
After the faulty drive happened, I ran a long smartctl scan of all drives attached to the system, the results can be seen at the link below:
http://pastebin.com/vBc2GyDy
Since then, I have had more strange errors when copying or scrubbing my volume, such as:
Code:
ahcich2: Timeout on slot 14 port 0 ahcich2: is 00000000 cs 00004000 ss 00000000 rs 00004000 tfd 50 serr 00000000 cmd 10008e17 ahcich4: Timeout on slot 24 port 0 ahcich4: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 50 serr 00000000 cmd 10009817 ahcich3: Timeout on slot 21 port 0 ahcich3: is 00000000 cs 00200000 ss 00000000 rs 00200000 tfd 50 serr 00000000 cmd 10009517 (ada10:ahcich14:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 80 06 69 40 a1 00 00 01 00 00 (ada10:ahcich14:0:0:0): CAM status: Uncorrectable parity/CRC error (ada10:ahcich14:0:0:0): Retrying command ahcich2: Timeout on slot 13 port 0 ahcich2: is 00000000 cs 00002000 ss 00000000 rs 00002000 tfd 50 serr 00000000 cmd 10008d17 (ada10:ahcich14:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 d8 55 f2 40 a1 00 00 01 00 00 (ada10:ahcich14:0:0:0): CAM status: Uncorrectable parity/CRC error (ada10:ahcich14:0:0:0): Retrying command ahcich3: Timeout on slot 2 port 0 ahcich3: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd 50 serr 00000000 cmd 10008217 ahcich3: Timeout on slot 18 port 0 ahcich3: is 00000000 cs 00040000 ss 00000000 rs 00040000 tfd 40 serr 00000000 cmd 10009217 ahcich5: Timeout on slot 19 port 0 ahcich5: is 00000000 cs 00080000 ss 00000000 rs 00080000 tfd 50 serr 00000000 cmd 10009317 ahcich2: Timeout on slot 11 port 0 ahcich2: is 00000000 cs 00000800 ss 00000000 rs 00000800 tfd 40 serr 00000000 cmd 10008b17 ahcich3: Timeout on slot 27 port 0 ahcich3: is 00000000 cs 08000000 ss 00000000 rs 08000000 tfd 50 serr 00000000 cmd 10009b17
Ada10 seems to be the drive that I receive CRC errors from. According to other Freenas forum posts could indicate a faulty SATA cable.
What about these ahcich timeout errors messages? Are these related to smartd? I ask because I can see the error message below displayed on my console and not on dmesg, indicating that:
Code:
Oct 4 08:24:08 freenas smartd[12317]: Device: /dev/ada3, failed to read SMART Attribute Data.
Any help would be most appreciated!
Last edited: