So I had to RMA a backplane on my Norco RPC-4224. No biggie. However, 2 disks are now detected by my RAID controller but not found in FreeBSD. Thank goodness I have a RAIDZ3 :P Here's the story.
Before any changes all was working fine. The zpool is 18x2TB on RAIDZ3. The first 4 drive slots of my case are connected to the onboard Intel controller. The next 20 drive slots are connected to my Areca 1280ML-24, from ports 1 through 16, and 21-24. Port 20 doesn't seem to work right on the controller so have never used it. So port 1 on my controller is drive slot 5 up to port 16 on the controller is drive 20 on my case, then the last 4 RAID controller ports on the last 4 drive slots. Prior to the maintenance I did tonight I had 4 drives on the Intel controller and 14 drives on my Areca controller from RAID ports 1 through 14. The controller is set to non-RAID mode and all of the drives are detected as JBOD(I will prove that in a minute).
So I pulled out the backplane for drive slots 17-20. I realized that the part I received is not the part I need(boo). It doesn't physically fit. So I figured I'd leave the backplane out, put the 2 drives that were in drive slots 17 and 18(RAID ports 13 and 14) to drive slots 21 and 22(RAID ports 20 and 21) and call it a day. After this shuffling of power and data cables I figured I would verified that the Areca controller sees 14 drive in JBOD and that the Intel controller sees 4 drives. All is good.
If all of this disk swapping and whatnot has made you go crosseyed, here's the simple explanation. I moved hard drives that were connected to RAID controller ports 16 and 17 to 20 and 21 respectively but with a different backplane(which shouldn't matter).
So I bootup the machine and do a zpool status.
Here's what I got:
So I check /dev to see what devices are there. I should expect ada0 through ada3 and da0 through da14(1 of the daXX devices is the boot USB key). Instead I get ada0-3 and da0-12. So clearly 2 disks are missing...hmm.
So I check my Areca BIOS and sure enough the drives ARE connected. So I go into my areca-cli again and do and save my config file. Here's the output...
So the RAID controller absolutely definitely sees and detects 14 drives. It lists 14 drives in independent RAIDSet Information, VolumeSet Information shows 14 JBOD drives, Physical Drive Information shows 14 drives in JBOD, and 14 drives with temps are all listed.
So now I'm thinking I need to fix this. So why not simply undo the changes I made. So I shutdown the server and swapped cables so the 2 drives that had changed RAID controller ports are now back where they were(but they are on a different backplane because I have the bad one removed). Booted up the system and..
And the config output from my RAID controller...
This could get dangerous for someone using an Areca controller. If they reboot their machine and the right disk disappears they could find that their zpool is suddenly unavailable(only temporary because if they reorder their disks without the failed disk then things would be fine) or they could lose all redundancy.
So I did some tests and the IDs are broken up into sets of 8 SATA ports. See the below chart:
If I were to remove #9 then disks in ports 9 through 14 would no longer function.
So where do I begin to troubleshoot this issue and find the cause? I'm thinking its a driver issue since the cli seems to have no problem using the disks when FreeNAS won't use it. Should I just open a ticket?
Before any changes all was working fine. The zpool is 18x2TB on RAIDZ3. The first 4 drive slots of my case are connected to the onboard Intel controller. The next 20 drive slots are connected to my Areca 1280ML-24, from ports 1 through 16, and 21-24. Port 20 doesn't seem to work right on the controller so have never used it. So port 1 on my controller is drive slot 5 up to port 16 on the controller is drive 20 on my case, then the last 4 RAID controller ports on the last 4 drive slots. Prior to the maintenance I did tonight I had 4 drives on the Intel controller and 14 drives on my Areca controller from RAID ports 1 through 14. The controller is set to non-RAID mode and all of the drives are detected as JBOD(I will prove that in a minute).
So I pulled out the backplane for drive slots 17-20. I realized that the part I received is not the part I need(boo). It doesn't physically fit. So I figured I'd leave the backplane out, put the 2 drives that were in drive slots 17 and 18(RAID ports 13 and 14) to drive slots 21 and 22(RAID ports 20 and 21) and call it a day. After this shuffling of power and data cables I figured I would verified that the Areca controller sees 14 drive in JBOD and that the Intel controller sees 4 drives. All is good.
If all of this disk swapping and whatnot has made you go crosseyed, here's the simple explanation. I moved hard drives that were connected to RAID controller ports 16 and 17 to 20 and 21 respectively but with a different backplane(which shouldn't matter).
So I bootup the machine and do a zpool status.
Here's what I got:
Code:
pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scan: scrub repaired 0 in 46h34m with 0 errors on Tue Jan 15 22:34:20 2013 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz3-0 DEGRADED 0 0 0 gptid/6fbb91d5-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/70448fd2-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/70c0c7b3-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/713de0d5-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/71e3eea1-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/728458d2-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/7326aebc-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/73c64f27-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/7468c69a-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/75045f96-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/75a0096a-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/763790a1-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/76d701fa-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/77759c5c-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/78190bd3-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/78bb9173-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 2647779287010833833 UNAVAIL 0 0 0 was /dev/gptid/795a7052-4a95-11e2-bca4-0015171496ae 12716287248176440471 UNAVAIL 0 0 0 was /dev/gptid/79fbc7b0-4a95-11e2-bca4-0015171496ae errors: No known data errors
So I check /dev to see what devices are there. I should expect ada0 through ada3 and da0 through da14(1 of the daXX devices is the boot USB key). Instead I get ada0-3 and da0-12. So clearly 2 disks are missing...hmm.
So I check my Areca BIOS and sure enough the drives ARE connected. So I go into my areca-cli again and do and save my config file. Here's the output...
Code:
RaidSet Information # Name Disks TotalCap FreeCap DiskChannels State =============================================================================== 1 Raid Set # 00 1 2000.4GB 0.0GB 1 Normal 2 Raid Set # 01 1 2000.4GB 0.0GB 2 Normal 3 Raid Set # 02 1 2000.4GB 0.0GB 3 Normal 4 Raid Set # 03 1 2000.4GB 0.0GB 4 Normal 5 Raid Set # 04 1 2000.4GB 0.0GB 5 Normal 6 Raid Set # 05 1 2000.4GB 0.0GB 6 Normal 7 Raid Set # 06 1 2000.4GB 0.0GB 7 Normal 8 Raid Set # 07 1 2000.4GB 0.0GB 8 Normal 9 Raid Set # 08 1 2000.4GB 0.0GB 9 Normal 10 Raid Set # 09 1 2000.4GB 0.0GB A Normal 11 Raid Set # 10 1 2000.4GB 0.0GB B Normal 12 Raid Set # 11 1 2000.4GB 0.0GB C Normal 13 Raid Set # 20 1 2000.4GB 0.0GB L Normal 14 Raid Set # 21 1 2000.4GB 0.0GB M Normal =============================================================================== GuiErrMsg<0x00>: Success. VolumeSet Information # Name Raid Name Level Capacity Ch/Id/Lun State =============================================================================== 1 WD20EARS-00S8B1 Raid Set # 00 JBOD 2000.4GB 00/00/00 Normal 2 WD20EARS-00S8B1 Raid Set # 01 JBOD 2000.4GB 00/00/01 Normal 3 WD20EARS-00S8B1 Raid Set # 02 JBOD 2000.4GB 00/00/02 Normal 4 WD20EARS-00S8B1 Raid Set # 03 JBOD 2000.4GB 00/00/03 Normal 5 WD20EARS-00S8B1 Raid Set # 04 JBOD 2000.4GB 00/00/04 Normal 6 WD20EARS-00S8B1 Raid Set # 05 JBOD 2000.4GB 00/00/05 Normal 7 WD20EARS-00S8B1 Raid Set # 06 JBOD 2000.4GB 00/00/06 Normal 8 WD20EARS-00S8B1 Raid Set # 07 JBOD 2000.4GB 00/00/07 Normal 9 WD20EARS-00S8B1 Raid Set # 08 JBOD 2000.4GB 00/01/00 Normal 10 WD20EARS-00S8B1 Raid Set # 09 JBOD 2000.4GB 00/01/01 Normal 11 WD20EARS-00S8B1 Raid Set # 10 JBOD 2000.4GB 00/01/02 Normal 12 WD20EARS-00S8B1 Raid Set # 11 JBOD 2000.4GB 00/01/03 Normal 13 WD20EARS-00S8B1 Raid Set # 20 JBOD 2000.4GB 00/02/04 Normal 14 WD20EARS-00S8B1 Raid Set # 21 JBOD 2000.4GB 00/02/05 Normal =============================================================================== GuiErrMsg<0x00>: Success. Physical Drive Information # Ch# ModelName Capacity Usage =============================================================================== 1 1 WDC WD20EARS-00S8B1 2000.4GB JBOD 2 2 WDC WD20EARS-00S8B1 2000.4GB JBOD 3 3 WDC WD20EARS-00S8B1 2000.4GB JBOD 4 4 WDC WD20EARS-00S8B1 2000.4GB JBOD 5 5 WDC WD20EARS-00S8B1 2000.4GB JBOD 6 6 WDC WD20EARS-00S8B1 2000.4GB JBOD 7 7 WDC WD20EARS-00S8B1 2000.4GB JBOD 8 8 WDC WD20EARS-00S8B1 2000.4GB JBOD 9 9 WDC WD20EARS-00S8B1 2000.4GB JBOD 10 10 WDC WD20EARS-00S8B1 2000.4GB JBOD 11 11 WDC WD20EARS-00S8B1 2000.4GB JBOD 12 12 WDC WD20EARS-00S8B1 2000.4GB JBOD 13 13 N.A. 0.0GB N.A. 14 14 N.A. 0.0GB N.A. 15 15 N.A. 0.0GB N.A. 16 16 N.A. 0.0GB N.A. 17 17 N.A. 0.0GB N.A. 18 18 N.A. 0.0GB N.A. 19 19 N.A. 0.0GB N.A. 20 20 N.A. 0.0GB N.A. 21 21 WDC WD20EARS-00S8B1 2000.4GB JBOD 22 22 WDC WD20EARS-00S8B1 2000.4GB JBOD 23 23 N.A. 0.0GB N.A. 24 24 N.A. 0.0GB N.A. =============================================================================== GuiErrMsg<0x00>: Success. Physical Hardware Information The Hardware Monitor Information =========================================== Fan#1 Speed (RPM) : N.A. Battery Status : 96% CPU Temperature : 44 (Celsius) Ctrl Temperature : 39 (Celsius) Power +12V : 12.160 Power +5V : 4.999 Power +3.3V : 3.280 SATA PHY +2.5V : 2.480 DDR-II +1.8V : 1.824 PCI-E +1.8V : 1.824 CPU +1.8V : 1.824 CPU +1.2V : 1.200 DDR-II +0.9V : 0.896 HDD #1 Temp. : 38 HDD #2 Temp. : 40 HDD #3 Temp. : 39 HDD #4 Temp. : 38 HDD #5 Temp. : 38 HDD #6 Temp. : 39 HDD #7 Temp. : 40 HDD #8 Temp. : 37 HDD #9 Temp. : 35 HDD #10 Temp. : 38 HDD #11 Temp. : 37 HDD #12 Temp. : 34 HDD #13 Temp. : 0 HDD #14 Temp. : 0 HDD #15 Temp. : 0 HDD #16 Temp. : 0 HDD #17 Temp. : 0 HDD #18 Temp. : 0 HDD #19 Temp. : 0 HDD #20 Temp. : 0 HDD #21 Temp. : 31 HDD #22 Temp. : 31 HDD #23 Temp. : 0 HDD #24 Temp. : 0 =========================================== GuiErrMsg<0x00>: Success.
So the RAID controller absolutely definitely sees and detects 14 drives. It lists 14 drives in independent RAIDSet Information, VolumeSet Information shows 14 JBOD drives, Physical Drive Information shows 14 drives in JBOD, and 14 drives with temps are all listed.
So now I'm thinking I need to fix this. So why not simply undo the changes I made. So I shutdown the server and swapped cables so the 2 drives that had changed RAID controller ports are now back where they were(but they are on a different backplane because I have the bad one removed). Booted up the system and..
Code:
pool: tank state: ONLINE scan: scrub repaired 0 in 46h34m with 0 errors on Tue Jan 15 22:34:20 2013 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 gptid/6fbb91d5-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/70448fd2-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/70c0c7b3-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/713de0d5-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/71e3eea1-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/728458d2-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/7326aebc-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/73c64f27-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/7468c69a-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/75045f96-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/75a0096a-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/763790a1-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/76d701fa-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/77759c5c-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/78190bd3-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/78bb9173-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/795a7052-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0 gptid/79fbc7b0-4a95-11e2-bca4-0015171496ae ONLINE 0 0 0
And the config output from my RAID controller...
Code:
RaidSet Information # Name Disks TotalCap FreeCap DiskChannels State =============================================================================== 1 Raid Set # 00 1 2000.4GB 0.0GB 1 Normal 2 Raid Set # 01 1 2000.4GB 0.0GB 2 Normal 3 Raid Set # 02 1 2000.4GB 0.0GB 3 Normal 4 Raid Set # 03 1 2000.4GB 0.0GB 4 Normal 5 Raid Set # 04 1 2000.4GB 0.0GB 5 Normal 6 Raid Set # 05 1 2000.4GB 0.0GB 6 Normal 7 Raid Set # 06 1 2000.4GB 0.0GB 7 Normal 8 Raid Set # 07 1 2000.4GB 0.0GB 8 Normal 9 Raid Set # 08 1 2000.4GB 0.0GB 9 Normal 10 Raid Set # 09 1 2000.4GB 0.0GB A Normal 11 Raid Set # 10 1 2000.4GB 0.0GB B Normal 12 Raid Set # 11 1 2000.4GB 0.0GB C Normal 13 Raid Set # 12 1 2000.4GB 0.0GB D Normal 14 Raid Set # 13 1 2000.4GB 0.0GB E Normal =============================================================================== GuiErrMsg<0x00>: Success. VolumeSet Information # Name Raid Name Level Capacity Ch/Id/Lun State =============================================================================== 1 WD20EARS-00S8B1 Raid Set # 00 JBOD 2000.4GB 00/00/00 Normal 2 WD20EARS-00S8B1 Raid Set # 01 JBOD 2000.4GB 00/00/01 Normal 3 WD20EARS-00S8B1 Raid Set # 02 JBOD 2000.4GB 00/00/02 Normal 4 WD20EARS-00S8B1 Raid Set # 03 JBOD 2000.4GB 00/00/03 Normal 5 WD20EARS-00S8B1 Raid Set # 04 JBOD 2000.4GB 00/00/04 Normal 6 WD20EARS-00S8B1 Raid Set # 05 JBOD 2000.4GB 00/00/05 Normal 7 WD20EARS-00S8B1 Raid Set # 06 JBOD 2000.4GB 00/00/06 Normal 8 WD20EARS-00S8B1 Raid Set # 07 JBOD 2000.4GB 00/00/07 Normal 9 WD20EARS-00S8B1 Raid Set # 08 JBOD 2000.4GB 00/01/00 Normal 10 WD20EARS-00S8B1 Raid Set # 09 JBOD 2000.4GB 00/01/01 Normal 11 WD20EARS-00S8B1 Raid Set # 10 JBOD 2000.4GB 00/01/02 Normal 12 WD20EARS-00S8B1 Raid Set # 11 JBOD 2000.4GB 00/01/03 Normal 13 WD20EARS-00S8B1 Raid Set # 12 JBOD 2000.4GB 00/01/04 Normal 14 WD20EARS-00S8B1 Raid Set # 13 JBOD 2000.4GB 00/01/05 Normal =============================================================================== GuiErrMsg<0x00>: Success. Physical Drive Information # Ch# ModelName Capacity Usage =============================================================================== 1 1 WDC WD20EARS-00S8B1 2000.4GB JBOD 2 2 WDC WD20EARS-00S8B1 2000.4GB JBOD 3 3 WDC WD20EARS-00S8B1 2000.4GB JBOD 4 4 WDC WD20EARS-00S8B1 2000.4GB JBOD 5 5 WDC WD20EARS-00S8B1 2000.4GB JBOD 6 6 WDC WD20EARS-00S8B1 2000.4GB JBOD 7 7 WDC WD20EARS-00S8B1 2000.4GB JBOD 8 8 WDC WD20EARS-00S8B1 2000.4GB JBOD 9 9 WDC WD20EARS-00S8B1 2000.4GB JBOD 10 10 WDC WD20EARS-00S8B1 2000.4GB JBOD 11 11 WDC WD20EARS-00S8B1 2000.4GB JBOD 12 12 WDC WD20EARS-00S8B1 2000.4GB JBOD 13 13 WDC WD20EARS-00S8B1 2000.4GB JBOD 14 14 WDC WD20EARS-00S8B1 2000.4GB JBOD 15 15 N.A. 0.0GB N.A. 16 16 N.A. 0.0GB N.A. 17 17 N.A. 0.0GB N.A. 18 18 N.A. 0.0GB N.A. 19 19 N.A. 0.0GB N.A. 20 20 N.A. 0.0GB N.A. 21 21 N.A. 0.0GB N.A. 22 22 N.A. 0.0GB N.A. 23 23 N.A. 0.0GB N.A. 24 24 N.A. 0.0GB N.A. =============================================================================== GuiErrMsg<0x00>: Success. Physical Hardware Information The Hardware Monitor Information =========================================== Fan#1 Speed (RPM) : N.A. Battery Status : 96% CPU Temperature : 44 (Celsius) Ctrl Temperature : 39 (Celsius) Power +12V : 12.160 Power +5V : 4.999 Power +3.3V : 3.280 SATA PHY +2.5V : 2.480 DDR-II +1.8V : 1.824 PCI-E +1.8V : 1.824 CPU +1.8V : 1.824 CPU +1.2V : 1.200 DDR-II +0.9V : 0.896 HDD #1 Temp. : 39 HDD #2 Temp. : 41 HDD #3 Temp. : 40 HDD #4 Temp. : 39 HDD #5 Temp. : 39 HDD #6 Temp. : 40 HDD #7 Temp. : 41 HDD #8 Temp. : 38 HDD #9 Temp. : 37 HDD #10 Temp. : 38 HDD #11 Temp. : 38 HDD #12 Temp. : 36 HDD #13 Temp. : 32 HDD #14 Temp. : 32 HDD #15 Temp. : 0 HDD #16 Temp. : 0 HDD #17 Temp. : 0 HDD #18 Temp. : 0 HDD #19 Temp. : 0 HDD #20 Temp. : 0 HDD #21 Temp. : 0 HDD #22 Temp. : 0 HDD #23 Temp. : 0 HDD #24 Temp. : 0 =========================================== GuiErrMsg<0x00>: Success.
This could get dangerous for someone using an Areca controller. If they reboot their machine and the right disk disappears they could find that their zpool is suddenly unavailable(only temporary because if they reorder their disks without the failed disk then things would be fine) or they could lose all redundancy.
So I did some tests and the IDs are broken up into sets of 8 SATA ports. See the below chart:
Code:
VolumeSet Information # Name Raid Name Level Capacity Ch/Id/Lun State =============================================================================== 1 WD20EARS-00S8B1 Raid Set # 00 JBOD 2000.4GB 00/00/00 Normal 2 WD20EARS-00S8B1 Raid Set # 01 JBOD 2000.4GB 00/00/01 Normal 3 WD20EARS-00S8B1 Raid Set # 02 JBOD 2000.4GB 00/00/02 Normal 4 WD20EARS-00S8B1 Raid Set # 03 JBOD 2000.4GB 00/00/03 Normal 5 WD20EARS-00S8B1 Raid Set # 04 JBOD 2000.4GB 00/00/04 Normal 6 WD20EARS-00S8B1 Raid Set # 05 JBOD 2000.4GB 00/00/05 Normal 7 WD20EARS-00S8B1 Raid Set # 06 JBOD 2000.4GB 00/00/06 Normal 8 WD20EARS-00S8B1 Raid Set # 07 JBOD 2000.4GB 00/00/07 Normal 9 WD20EARS-00S8B1 Raid Set # 08 JBOD 2000.4GB 00/01/00 Normal 10 WD20EARS-00S8B1 Raid Set # 09 JBOD 2000.4GB 00/01/01 Normal 11 WD20EARS-00S8B1 Raid Set # 10 JBOD 2000.4GB 00/01/02 Normal 12 WD20EARS-00S8B1 Raid Set # 11 JBOD 2000.4GB 00/01/03 Normal 13 WD20EARS-00S8B1 Raid Set # 12 JBOD 2000.4GB 00/01/04 Normal 14 WD20EARS-00S8B1 Raid Set # 13 JBOD 2000.4GB 00/01/05 Normal ===============================================================================
If I were to remove #9 then disks in ports 9 through 14 would no longer function.
So where do I begin to troubleshoot this issue and find the cause? I'm thinking its a driver issue since the cli seems to have no problem using the disks when FreeNAS won't use it. Should I just open a ticket?