How do I install and configure an LSI SAS 9300-4i4e HBA card in FreeNAS?

Status
Not open for further replies.

ashori

Dabbler
Joined
Jun 17, 2014
Messages
17
Hi all,

I recently purchased an LSI SAS 9300-4i4e HBA card and installed the hardware in my FreeNAS server. All I have done so far is flash the LSI card in a windows box using the MegaRaid Storage Manager with 9300_4i4e_Package_P6_IR_IT_Firmware_BIOS_for_MSDOS_Windows.zip which contained SAS9300_4i4e_IT_ACM.bin I then popped the card into the NAS motherboard and connect it to my JBOD enclosure via an external mini-SAS 8088 connector. I put two 1TB test drives into the JBOD which show up in the LSI bios and also in FreeNAS so there's connectivity.

My questions are:
  • What do I do next?
  • Is there something I need to turn on in FreeNAS and how do I do that?
  • I mean I know that by being able to see the drives in "view disks" which show up as da0 and da1 its 90% of the battle but it would be nice to be able to get some information from the LSI card like throughput, version, temperature, any other information it might supply to the OS. When the LSI card was in the windows 8.1 desktop, I was able to pull quite a bit of info from the card using MegaRAID (I dont recall exactly what as I just glanced at it during the flashing of the card), is there something similar for FreeNAS?
  • Is there a driver for FreeNAS that I have to install?
  • How do I know that my LSI card is optimized? By optimized I mean the card is setup and working well and I'm getting max IOPS? A file copy from the internal disks on the NAS to the JBOD yield 90mbps transfers, which seem slow to me given that both the HBA and the JBOD backplanes are 12GB/s. Using the cp command from FreeNAS shell, I copied a 135Gb file from an internal NAS HDD to the JBOD and it took more than 40 minutes! :0
  • Somewhat related: I'm using a Supermicro SC847E2C-R1K28JBOD enclosure as the JBOD. Is there anyway to get any HW info from that chassis as well under FreeNAS? It would be nice to be able to get FreeNAS to make the specific HDD light blink so that I know which disk has failed and needs replacing right from the GUI.
Thanks all!
 
Last edited by a moderator:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
a) Get rid of the boot ROM, it's useless unless you're booting from the card (I'm guessing you're not).
b) There's nothing to "turn on".
c) I'm guessing the driver provides the information you want, but you'll have to read the man page.
d) You can't install drivers. You get the ones that FreeNAS includes.
e) There's nothing to "optimize" on the controller, besides using IT firmware and matching it to the driver version (dunno what the LSI SAS 3 driver version is... Check the man page.)
f) Getting the right lights to blink is the stuff you have to get very specific hardware combinations. Feel free to research, but it's hard. I do agree it's damn cool and useful.
 

ashori

Dabbler
Joined
Jun 17, 2014
Messages
17
Hi Ericleowe,

a) Get rid of the boot ROM, it's useless unless you're booting from the card (I'm guessing you're not).

I did this by changing the boot option on the ROM to disabled. From my understanding however all that does is disable boot from any of the JBOD disks. I dont know if this ignores the other settings in the BIOS.

b) There's nothing to "turn on".

Noted

c) I'm guessing the driver provides the information you want, but you'll have to read the man page.

I reviewed the Phase6 Pre-Alpha Release Version 6.255.01.00 - FREEBSD_MPT_GEN3_PHASE6.0 (SCGCQ00701316) on the LSI site but I'm not really certain which version driver comes bundled with FreeNAS now.

d) You can't install drivers. You get the ones that FreeNAS includes.

Noted. For the benefit of others I did find out that FreeNAS added SAS3 LSI support 6 months ago for FreeNAS 9.2.1.x onwards. To turn it on add a tunable: mps3_load with a value of YES - more info here https://bugs.pcbsd.org/issues/4779. I'm not sure if newer drivers ie the FREEBSD_MPT_GEN3_P6 have been updated in FreeNAS. Anyone have detailed info on which version is now included?

e) There's nothing to "optimize" on the controller, besides using IT firmware and matching it to the driver version (dunno what the LSI SAS 3 driver version is... Check the man page.)

f) Getting the right lights to blink is the stuff you have to get very specific hardware combinations. Feel free to research, but it's hard. I do agree it's damn cool and useful.

I think this is a futile effort because even the SuperMicro manual for the chassis is outdated and wrong. If the manufacturer cant get the documentation right, I think a bigger effort like interacting with the chassis hardware will be completely out of their powers.

Thanks for the reply!
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
I have a 3008 which seems like the same family as the 9300. The FreeBSD driver version seems to be 5. Do you see that as well on your system? My firmware appears to be v6. I'm going to try a downgrade.

PS - I was getting around 170MB on SAS2 drives connected to my SAS3 backplane and controller, so your ~90MB definitely seems off.

Code:
[root@freenas] ~# dmesg | grep mp
mpr0: <LSI SAS3008> port 0xe000-0xe0ff mem 0xfb200000-0xfb20ffff irq 26 at device 0.0 on pci1
mpr0: IOCFacts  :
mpr0: Firmware: 06.00.00.00, Driver: 05.255.05.00-fbsd
mpr0: IOCCapabilities: 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I have a 3008 which seems like the same family as the 9300. The FreeBSD driver version seems to be 5. Do you see that as well on your system? My firmware appears to be v6. I'm going to try a downgrade.

PS - I was getting around 170MB on SAS2 drives connected to my SAS3 backplane and controller, so your ~90MB definitely seems off.

Code:
[root@freenas] ~# dmesg | grep mp
mpr0: <LSI SAS3008> port 0xe000-0xe0ff mem 0xfb200000-0xfb20ffff irq 26 at device 0.0 on pci1
mpr0: IOCFacts  :
mpr0: Firmware: 06.00.00.00, Driver: 05.255.05.00-fbsd
mpr0: IOCCapabilities: 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>

It's not the same family, it's the controller from the LSI SAS 9300 - so it's actually the exact same thing ;)
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Thanks for the confirmation Ericloewe! :smile:
 

ashori

Dabbler
Joined
Jun 17, 2014
Messages
17
Experienced weird stuff when I simulated a jbod failure (yanked the cable) and saw a massive failure of epic proportions. What I have in the JBOD to test with now is two spare 1TB WD SATA blues. Created a stripe set and copied a few random +100gb files over (still getting the same slow transfer speed but this pales in comparison with what I experienced). When I disconnected the HBA cable (disks were not doing anything, I disabled CIFS to make sure nothing was connected to them), as expected, saw the disks throw a zpool error. Reconnected the cable and tried the zpool clear but kept getting an I/O error. Rebooted the server and it refused to boot back up, only went to single user mode. I had to reinstall FreeNAS before it would boot to the GUI. After booting up, the disks show up again like nothing ever happened to them. Can now copy/read from the array.

Wish I had the log files but I didn't think to save them before I rebooted... :(

I thought maybe its because they were striped, so I deleted the pool, created a new mirror and did the same test.. copied files (same speeds again) and yanked the cable. Saw the pool throw up another error and zpool clear had the same or similar i/o errors. Rebooted with trepidation and FreeNAS again throws me into single user mode. Reinstalled the OS and it sees the pool once it comes up.

This may be a deal breaker for me because if FreeNAS dies every time a disk dies or there's a loss of power on the JBOD, I'll need to do a reinstall each time. Even with the config saved, this is a real pain. What I expected was for FreeNAS to throw an error about missing disks, then when it detects them again, to carry on operations. But certainly not to die when I rebooted it :( :(.

Glad I'm on the right track with the driver anyway..

Code:
[root@Medusa ~]# dmesg | grep mp
#<other stuff, deleted>
mpr0: <LSI SAS3008> port 0xde00-0xdeff mem 0xfdbf0000-0xfdbfffff irq 16 at device 0.0 on pci2
mpr0: IOCFacts :
mpr0: Firmware: 06.00.00.00, Driver: 05.255.05.00-fbsd
mpr0: IOCCapabilities: 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>						
#<other stuff, deleted>
[root@Medusa ~]# 


Oh LSI confirmed that in HBA mode, there's nothing to configure on the card that will impact its operations:


and this from the local supplier:
Here is what we got from the LSI support. It seems like nothing more setting you can change in IT mode.
LSI Support reply:

This is all we have
http://www.lsi.com/downloads/Public/Host Bus Adapters/SAS3_IR_UG.pdf

You can change some settings in the bios via Advanced Adapter Properties.

There is not a lot you can change with an HBA especially in IT mode

Sorry if this is a bit muddled, its late after a long day but I thought to share with everyone before I hit the sack. :)
 
Last edited by a moderator:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
First, yanking the cable isn't a good test. It really doesn't do "real world" very well.

Second, disks removed from the system will be readded *only* if the hardware supports hotswap and/or hotplugging, as appropriate.

Third, ZFS will *not* automatically re-add disks to a pool. This is to protect the pool from disks that are intermittently failing and go offline, online, then offline regularly. So if you pull the plug you shouldn't expect them to come back online automatically. A reboot (or restoring the disks via the CLI, which isn't exactly recommended) is the proper way to deal with it.

FreeNAS doesn't "die every time a disk dies", but FreeNAS will die if the pool containing the .system dataset dies, which you did by way of creating a striped zpool and then pulling a disk(s). Your test was basically invalid because you put the server in a position it shouldn't be allowed to get into... a position where the .system dataset is lost because the pool is lost. Well, the harsh reality is that your server doesn't do you much good if you've mismanaged the pool the point that you lost the pool, and that's about all you demonstrated in your test.

Try the test with a RAIDZ2 and pulling one disk and you'll see the server happily continue on its merry way. But you need to make sure your test reflect real-world scenarios, which your test didn't do. ;)

But, to reiterate your comment about "What I expected was for FreeNAS to throw an error about missing disks, then when it detects them again, to carry on operations" that is NOT good practice. If a disk is dropped you want it to stay dropped until you put it back in the pool (or replace it as necessary). You can risk data loss and corruption by having disks randomly dropping out and coming back in automatically. ZFS is designed to manage itself conservatively and to let the server administrator make the decision as to how to proceed when a less than ideal situation develops. (I'm about to sound offensive, but I don't mean for it to be taken that way) If this behavior is unacceptable to you then you should probably reassess your priorities. If you've managed your server appropriately you should be getting emails when things go wrong (such as a disk dropping from the server suddenly). And when they go wrong the expectation is that you'll jump on them and handle them with due diligence, scheduling a maintenance window or whatever is necessary.
 
Last edited by a moderator:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, a JBOD dying (power or data cable yanked) is a real possibility. Sounds like we need a warning somewhere that the .system dataset should stay as close to the server as possible (meaning inside it).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, a JBOD dying (power or data cable yanked) is a real possibility. Sounds like we need a warning somewhere that the .system dataset should stay as close to the server as possible (meaning inside it).

I don't really think that is necessary. If you've lost enough drives that you have an unmounted pool, who cares if the server is even on? It can't serve data since it has no pool. It's like being upset because the power is off on the server. Can't argue that a powered-off server isn't providing data fast enough, can you? ;)
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Did LSI say anything about downgrading the firmware on the card to Phase 5 to match the driver version?

Code:
mpr0: Firmware: 06.00.00.00, Driver: 05.255.05.00-fbsd
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Did LSI say anything about downgrading the firmware on the card to Phase 5 to match the driver version?

Code:
mpr0: Firmware: 06.00.00.00, Driver: 05.255.05.00-fbsd

Well, that's standard procedure on SAS2, so I'd expect it to be the same for SAS3.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I don't really think that is necessary. If you've lost enough drives that you have an unmounted pool, who cares if the server is even on? It can't serve data since it has no pool. It's like being upset because the power is off on the server. Can't argue that a powered-off server isn't providing data fast enough, can you? ;)

Makes sense, but it's not quite as clear-cut on servers with multiple pools.

Something like a big slow pool in a JBOD and a smaller SSD pool internally. The thought process might be "Hey, the .system dataset doesn't really require performance, so let's not trouble our SSD pool with it."

In retrospect, it's obvious - but it's one of those things that's best stated in order to avoid mistakes caused by distraction.
 

ashori

Dabbler
Joined
Jun 17, 2014
Messages
17
Hi all,

FreeNAS OS runs off a USB boot and I have the system dataset on a raid 0 internally in the server (its not ideal for fault tolerance right now, the JBOD is part of the improvements I want to make with the NAS because when I first set it up all up ages ago, I really needed the space). The JBOD enclosure is new and empty except for the two 1TB's in there. This is why I didn't expect the OS to be affected when the JBOD went down.

I want to move the files off the internal drives to the JBOD and then look at how to redo the system dataset as a mirror or z2 to make it more robust.

Volumes: jbod view disks.JPG

View disks: jbod 2.JPG

Apologies if I wasn't clear earlier..
 
Last edited by a moderator:

ashori

Dabbler
Joined
Jun 17, 2014
Messages
17
Hi all,

Been busy travelling halfway around the world over Dec and then life got busy in Jan. Finally made some time to play with the JBOD again and FreeNAS.

Did a simple test again today by just pulling one of the two hard drives in the JBOD to simulate HDD failure. Again I'm not using the JBOD at the moment until I can validate that its fault tolerant and doesnt have any bugs.

FreeNAS threw the following error:
Code:
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6
Feb  2 17:07:48 Medusa kernel: vm_fault: pager read error, pid 1 (init)
Feb  2 17:07:48 Medusa kernel: swap_pager: I/O error - pagein failed; blkno 2621986,size 4096, error 6


and it goes on and on. Had to walk over and hit the reset on the server. This time thankfully it didn't need a reload on the OS.

Am running FreeNAS-9.2.1.9-RELEASE-x64 (2bbba09) and am preparing to upgrade to 9.3. Will test again after the upgrade and if it still has issues I suppose I'll need to engage someone senior here and post a bug report.
 
Last edited by a moderator:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Nvm
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If you read the manual it explicitly says NOT to pull disks that are in the system. I realize you're about to argue that you are just simulating disk failure (I did the same thing when I first started playing with FreeNAS) but the reality is that a failed disk does NOT work exactly the same as pulling a disk out. Since we already know you shouldn't pull disks without properly detaching them from the OS the test is invalid as a result.
 

ashori

Dabbler
Joined
Jun 17, 2014
Messages
17
Cyberjock, how would you simulate a power failure in a JBOD enclosure for example? Where all the disks in the enclosure go down at once? Or maybe some of them go down due to backplane failure? I want to entrust my critical irreplaceable data to the larger capacity JBOD enclosure attached to my NAS and need to know if it will survive the above. Or at least not kill the OS or require having to hit the reset button on the server every time.

So far after upgrading to 9.3 I've come a little further along. I pulled one of the test drives in the stripe and of course destroyed the stripe set (as expected) and when I plugged the drive back in zpool clear and zpool reset didn't do anything. FreeNAS didn't see the disk when I plugged it back in. I rebooted the server via console, and it hung midway through. Went over and hit reset and thankfully it all came back up with minimal fuss.

I deleted the test array, recreated it as a mirror since pulling one drive in a mirror should not impact anything. FreeNAS reported the zpool as unhealthy. I plug the drive back in and then the FreeNAS GUI went down. Unable to connect. The server was still running and I was able to still see my files over the network. Plex was still running as well. Went to the NAS and checked the CLI there and it was working normally. The last message was about the drive going missing, nothing after that. I hit space, bringing up the menu and chose reboot. It rebooted and is back up again.

Imho with 9.3 the OS is more robust against drive failures in that the OS doesn't need to be reinstalled like I experienced in 9.2. However it still needs to be rebooted every time a disk goes down which is a pain.

While typing this I decided to also "offline and replace" a drive like everyone says I should be doing. This is the result:

Code:
Feb  4 13:20:20 Medusa manage.py: [middleware.exceptions:38] [MiddlewareError: Disk replacement failed: "invalid vdev specification, use '-f' to override the following errors:, /dev/gptid/8300ea9a-ac2d-11e4-8fc3-001fd0247f8f is part of active pool 'JBOD_TEST_MIRROR', "]
 
Last edited by a moderator:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The best way is to use an actual bad disk. That's what I used after I was flogged for pulling a drive under load. :P I keep a few bad disks around "just because".

If an entire JBOD enclosure were to lose power you'd see the pool go offline (and I've seen exactly that too). So "recovery" is simple. Reboot and power up the JBOD enclosure and hope all is well. Also if the .system dataset were to go offline the system would go horribly bad (which I think you saw when you pulled a disk in a striped dataset). The CTO of iX has made it clear that striped pools are a fool's errand and really don't do you any good except to make it more likely to lose the data.

Keep in mind the second you say "irreplacable data" you also said "with a solid backup", right? ;)

Some hardware supports hotswap and/or hotplug, some doesn't. I take the assumption that it doesn't unless you've proven it does. It sounds like yours doesn't. Quite a bit of hardware claims to support hotswap and/or hotplugging, but it's not supported on FreeBSD. This is one of the reasons why we recommend LSI controllers based on the 2008-chipset (they definitely support it just fine). It also takes action on your part if a disk is disconnected to "put it back in the pool". Generally we recommend a reboot.

The 9300s definitely have potential (and I have no doubt there is work going on to optimize the driver and make it a feature-complete package like the 2008 chips already do. But they aren't there yet. In fact, I believe the driver was only added in 9.2.1 and the release notes said it was experimental and untested. So you have that to contend with. Bleeding edge sometimes means you'll bleed.
 

ashori

Dabbler
Joined
Jun 17, 2014
Messages
17
Hi Cyberjock, your advice is excellent and has really helped reset my expectations in regards to how FreeNAS behaves. I see now that reboots of the NAS are going to be a big part of my life moving forward :p. What you had to say about hotswapping drives is spot on too. Although the HBA card/JBOD enclosure sees and recognizes the drives when I stay in the HBA BIOS, there doesn't seem to be a very good connection to the OS so it in turn cannot see when a disk is replaced/added without the reboot for the moment. I will wait patiently for a new driver and hope for the best :).

In regards to solid backups, this is what I currently have (its horrible I know):

1. RAID6 5 x 2tb on my desktop data drive
2. Nightly backup to the stripe set 3 x 2tb on the NAS (internal drives) which is also the system dataset
3. Crash Central backup to the cloud running on my desktop

Its slightly better than nothing and honestly I lose sleep over this config right now. The reliability of this is almost zilch :(.

What I am trying to do is setup the following:
1. RAID6 5 x 2tb on my desktop data drive
2. Nightly backup to RAID 60 10 x 2tb on the JBOD -or- RAID 6 5 x 2tb with replication to a 2nd RAID6 5x 2tb in the same JBOD (what would you recommend?)
3. Crash Central backup

The good news is that I currently have 900Gb of data due to my reluctance to grow it because of space constraints. The NAS stripe set is 75% full at the mo.
 
Last edited by a moderator:
Status
Not open for further replies.
Top