iscsi connection dropping vms read only

chetan008

Dabbler
Joined
Jan 10, 2020
Messages
13
Hello All,

Since last couple of days our iscsi connection getting drop with below erros. Our freenas storage connected to kvm HV's via iscsi. All vms gone into read only mode. Please suggest what could be issue. we have checked there is no network connectivity issue.

WARNING: 10.0.0.0.0 (iqn.1994-05): tasks terminated
WARNING: 10.xx.xx.xx. (iqn.1994-05.com.): tasks terminated
WARNING: 10.xx.xx.xx.xx (iqn.1994-05.com.r): tasks terminated
ropping connection
WARNING: 10.200.5.254 (iqn.1994-05.com.redhat:HV6): connection error; dropping connection
WARNING: 10.200.1.254 (iqn.1994-05.com.redhat:backup): connection error; dropping connection
ctl_datamove: tag 0xbc3c80e on (1:7:7) aborted
ctl_datamove: tag 0xbc3c80f on (1:7:7) aborted
ctl_datamove: tag 0xbc3c810 on (1:7:7) aborted
ctl_datamove: tag 0xbc3c811 on (1:7:7) aborted
ctl_datamove: tag 0xbc3c812 on (1:7:7) aborted
ctl_datamove: tag 0xbc3c813 on (1:7:7) aborted
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
What's logged on the initiator and target sides? Does the HV act as the initiator for all volumes, or does each VM initiate individually? There's not enough info yet to troubleshoot.
 
Last edited:

chetan008

Dabbler
Joined
Jan 10, 2020
Messages
13
Dear Samuel,

We have 7 KVM host act as initiator to freenas iscsi targets. We have mounted iscsi datastore to KVM cluster for create vdisk.

1592695902008.png


1592696321702.png

ctl_datamove: tag 0x10000048 on (0:16:15) aborted
(0:8:6/6): WRITE(16). CDB: 8a 00 00 00 00 03 1b b0 56 00 00 00 00 10 00 00
ctl_datamove: tag 0x1000003e on (0:16:15) aborted
ctl_datamove: tag 0x10000068 on (0:16:15) aborted
ctl_datamove: tag 0x10000027 on (6:16:15) aborted
ctl_datamove: tag 0x1000001b on (5:16:15) aborted
(0:7:8/8): Tag: 0x106977b1, type 1
(0:7:8/8): ctl_process_done: 91 seconds
(7:8:6/6): ctl_datamove: 91 seconds
(5:8:6/6): ctl_datamove: 92 seconds
ctl_datamove: tag 0x20000061 on (5:8:6) aborted
(4:8:6/6): ctl_datamove: 94 seconds
ctl_datamove: tag 0x20000079 on (7:8:6) aborted

Please advice, what could be the issue.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
The iSCSI volumes are very slow, and the disconnects may be simple timeouts. Can you describe how your pool is constructed, and where the zvols are located within the pool? Are you using SSDs for your pool data devices, or only as cache?

Odds are there's been a failure somewhere in your pool which is affecting write performance to the zvols, most likely within your SLOG devices.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Am I reading that summary right in that there is 1.5T of RAM in that machine? (Nice.)

Start at the bottom of the OSI model and work your way up - think physical. Since this is a new issue, look for a cable that may have been damaged by recent maintenance activity. Network or SAS since I see JBODs in use. Check your drive health and SMART reports as well.

Look at the switches being used for the iSCSI connections for signs of congestion (dropped packets, error counters rising)

A pool layout and description may also help. What hardware is in use for HBAs and NICs?
 

chetan008

Dabbler
Joined
Jan 10, 2020
Messages
13
Hello Samuel and Honey,

we have using below hardware
Server Model DELL 730xd

1.9 TB SSD disks
server dell 730x connected with two Dell MD 1220-1 Jbod.
We are using H830 card to pass through MD disks to freenas.
Within local Dell 730XD server we are using H730P Mini raid controller configure in non raid mode
We are using 2x10G intel x540-t2 nic cards in lacp. Please find attached zpool status. and volume information.







 

Attachments

  • pool-status.zip
    1.2 KB · Views: 219

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Code:
root@#[~]# zpool list
NAME                  SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
Dell-730-mini-local  27.8T  3.85T  23.9T        -         -    22%    13%  1.00x  ONLINE  /mnt
MD1220-0-one         41.8T  7.43T  34.3T        -         -    13%    17%  1.00x  ONLINE  /mnt
MD1220-1-TWO         41.8T  1.89T  39.9T        -         -     2%     4%  1.00x  ONLINE  /mnt
freenas-boot          444G   846M   443G        -         -      -     0%  1.00x  ONLINE  -

root@wwstorage-ssd[~]# zpool status -v
  pool: Dell-730-mini-local
state: ONLINE
  scan: scrub repaired 0 in 0 days 02:09:58 with 0 errors on Sun Dec 22 02:10:02 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        Dell-730-mini-local                             ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/d0f724b5-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/d220feec-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/d3311a03-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/d455b5c1-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/d5648f97-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/d6927610-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/d7dbd1f0-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/d8e9e7d2-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/da2dc26b-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/db3b6a52-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/dc7f005d-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/dda8bb0e-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/ded510d3-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/dffc99f1-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/e126512c-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/e24aae55-c354-11e9-a366-1418772dfffb  ONLINE       0     0     0

errors: No known data errors

  pool: MD1220-0-one
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:51:29 with 0 errors on Sun Dec 22 00:51:31 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        MD1220-0-one                                    ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/7ee35028-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/80b79e7d-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/82729390-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/8450a9c6-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/8609845c-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/87ed9822-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/89ac2559-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/8b8cc6d6-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/8d45c970-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/8f2553f2-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/90e725a5-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/92abcbc0-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/94865547-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/963c8571-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/981757e5-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/99cf3a46-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/9bbcd89e-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/9d7ba6b9-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/9f5c8da8-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/a1193793-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/a2f6e4b8-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/a4b09fd3-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/a68d9bd2-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0
            gptid/a850641a-c356-11e9-a366-1418772dfffb  ONLINE       0     0     0

errors: No known data errors

  pool: MD1220-1-TWO
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:00 with 0 errors on Sun Dec 22 00:00:01 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        MD1220-1-TWO                                    ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/08fd4b9d-c41a-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/303e9bbf-c41a-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/59538798-c41a-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/81da21ab-c41a-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/aac7ccf9-c41a-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/d25a5767-c41a-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/0c3bee74-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/35e73cb8-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/4b936e71-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/4effd882-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/52836a90-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/56277616-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/59b4a494-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/5d6aad42-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/6116a07a-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/64ba7cae-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/687ce806-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/6c297e45-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/70605bad-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/7418a75f-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/77d1fbe6-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/7b9388bd-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/7f27d3d4-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0
            gptid/82f1aa03-c41b-11e9-b54e-1418772dfffb  ONLINE       0     0     0

errors: No known data errors


Yeesh. Your pools are constructed inappropriately for serving iSCSI zvols. First, zvols work best on mirror pools, not RAIDZx pools. You can mitigate some of the performance discrepancy by using SLOG and L2ARC, but none of your pools have this. Second, your RAIDZ2 pools are too wide; beyond 12 disks, you can actually start losing performance. In your case, per zvol writes are taking over 90 seconds to complete.

Since this setup was working before, it's likely your H830 card is going bad, or the cables from the H830 to the MD 1220-1s need to be replace.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
As Samuel points out those vdevs are way too wide. You are effectively getting the random I/O of a single SSD.

I also see you're running the H730 and H830 - as long as they are in true HBA mode that should be fine, but check your firmware revisions, there were several that were flagged as Critical from Dell. (While ZFS and vSAN are very different solutions, they have one thing in common which is that they're both very good at exposing bugs in HBA/controller firmware.)

http://poweredgec.dell.com/latest_poweredge-13g.html#R730XD SAS RAID

Check your backplane firmware as well, in both the head unit and the MD1220s. You may need to connect them to a non-HBA-mode controller temporarily to do this though.

Can you also do camcontrol identify daX on one of those drives? Want to ensure that TRIM is being exposed here.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
If you have a chance to reconfigure your pool, you'll be much happier with something like this:
Code:
Dell-730-mini-local
  mirror
    gptid/d0f724b5-c354-11e9-a366-1418772dfffb
    gptid/d220feec-c354-11e9-a366-1418772dfffb
    gptid/d3311a03-c354-11e9-a366-1418772dfffb
  mirror
    gptid/d455b5c1-c354-11e9-a366-1418772dfffb
    gptid/d5648f97-c354-11e9-a366-1418772dfffb
    gptid/d6927610-c354-11e9-a366-1418772dfffb
  mirror
    gptid/d7dbd1f0-c354-11e9-a366-1418772dfffb
    gptid/d8e9e7d2-c354-11e9-a366-1418772dfffb
    gptid/da2dc26b-c354-11e9-a366-1418772dfffb
  mirror
    gptid/db3b6a52-c354-11e9-a366-1418772dfffb
    gptid/dc7f005d-c354-11e9-a366-1418772dfffb
    gptid/dda8bb0e-c354-11e9-a366-1418772dfffb
  spare
    gptid/ded510d3-c354-11e9-a366-1418772dfffb
    gptid/dffc99f1-c354-11e9-a366-1418772dfffb
    gptid/e126512c-c354-11e9-a366-1418772dfffb
    gptid/e24aae55-c354-11e9-a366-1418772dfffb


Code:
MD1220-0-one                                 
  mirror
    gptid/7ee35028-c356-11e9-a366-1418772dfffb
    gptid/80b79e7d-c356-11e9-a366-1418772dfffb
    gptid/82729390-c356-11e9-a366-1418772dfffb
  mirror
    gptid/8450a9c6-c356-11e9-a366-1418772dfffb
    gptid/8609845c-c356-11e9-a366-1418772dfffb
    gptid/87ed9822-c356-11e9-a366-1418772dfffb
  mirror
    gptid/89ac2559-c356-11e9-a366-1418772dfffb
    gptid/8b8cc6d6-c356-11e9-a366-1418772dfffb
    gptid/8d45c970-c356-11e9-a366-1418772dfffb
  mirror
    gptid/8f2553f2-c356-11e9-a366-1418772dfffb
    gptid/90e725a5-c356-11e9-a366-1418772dfffb
    gptid/92abcbc0-c356-11e9-a366-1418772dfffb
  mirror
    gptid/94865547-c356-11e9-a366-1418772dfffb
    gptid/963c8571-c356-11e9-a366-1418772dfffb
    gptid/981757e5-c356-11e9-a366-1418772dfffb
  mirror
    gptid/99cf3a46-c356-11e9-a366-1418772dfffb
    gptid/9bbcd89e-c356-11e9-a366-1418772dfffb
    gptid/9d7ba6b9-c356-11e9-a366-1418772dfffb
  mirror
    gptid/9f5c8da8-c356-11e9-a366-1418772dfffb
    gptid/a1193793-c356-11e9-a366-1418772dfffb
    gptid/a2f6e4b8-c356-11e9-a366-1418772dfffb
  spare
    gptid/a4b09fd3-c356-11e9-a366-1418772dfffb
    gptid/a68d9bd2-c356-11e9-a366-1418772dfffb
    gptid/a850641a-c356-11e9-a366-1418772dfffb


Code:
MD1220-1-TWO         
  mirror
    gptid/08fd4b9d-c41a-11e9-b54e-1418772dfffb
    gptid/303e9bbf-c41a-11e9-b54e-1418772dfffb
    gptid/59538798-c41a-11e9-b54e-1418772dfffb
  mirror
    gptid/81da21ab-c41a-11e9-b54e-1418772dfffb
    gptid/aac7ccf9-c41a-11e9-b54e-1418772dfffb
    gptid/d25a5767-c41a-11e9-b54e-1418772dfffb
  mirror
    gptid/0c3bee74-c41b-11e9-b54e-1418772dfffb
    gptid/35e73cb8-c41b-11e9-b54e-1418772dfffb
    gptid/4b936e71-c41b-11e9-b54e-1418772dfffb
  mirror
    gptid/4effd882-c41b-11e9-b54e-1418772dfffb
    gptid/52836a90-c41b-11e9-b54e-1418772dfffb
    gptid/56277616-c41b-11e9-b54e-1418772dfffb
  mirror
    gptid/59b4a494-c41b-11e9-b54e-1418772dfffb
    gptid/5d6aad42-c41b-11e9-b54e-1418772dfffb
    gptid/6116a07a-c41b-11e9-b54e-1418772dfffb
  mirror
    gptid/64ba7cae-c41b-11e9-b54e-1418772dfffb
    gptid/687ce806-c41b-11e9-b54e-1418772dfffb
    gptid/6c297e45-c41b-11e9-b54e-1418772dfffb
  mirror
    gptid/70605bad-c41b-11e9-b54e-1418772dfffb
    gptid/7418a75f-c41b-11e9-b54e-1418772dfffb
    gptid/77d1fbe6-c41b-11e9-b54e-1418772dfffb
  spare
    gptid/7b9388bd-c41b-11e9-b54e-1418772dfffb
    gptid/7f27d3d4-c41b-11e9-b54e-1418772dfffb
    gptid/82f1aa03-c41b-11e9-b54e-1418772dfffb
 

chetan008

Dabbler
Joined
Jan 10, 2020
Messages
13
As Samuel points out those vdevs are way too wide. You are effectively getting the random I/O of a single SSD.

I also see you're running the H730 and H830 - as long as they are in true HBA mode that should be fine, but check your firmware revisions, there were several that were flagged as Critical from Dell. (While ZFS and vSAN are very different solutions, they have one thing in common which is that they're both very good at exposing bugs in HBA/controller firmware.)

http://poweredgec.dell.com/latest_poweredge-13g.html#R730XD SAS RAID

Check your backplane firmware as well, in both the head unit and the MD1220s. You may need to connect them to a non-HBA-mode controller temporarily to do this though.

Can you also do camcontrol identify daX on one of those drives? Want to ensure that TRIM is being exposed here.


root@wwstorage-ssd[~]# camcontrol identify da1
pass1: <VK1920GFLKL HPG0> ACS-2 ATA SATA 3.x device
pass1: 150.000MB/s transfers, Command Queueing Enabled

protocol ATA/ATAPI-9 SATA 3.x
device model VK1920GFLKL
firmware revision HPG0
serial number FI73N01081040390Y
WWN 5ace42e0200cf32c
cylinders 16383
heads 16
sectors/track 63
sector size logical 512, physical 4096, offset 0
LBA supported 268435455 sectors
LBA48 supported 3750748848 sectors
PIO supported PIO4
DMA supported WDMA2 UDMA5
media RPM non-rotating
Zoned-Device Commands no

Feature Support Enabled Value Vendor
read ahead yes yes
write cache yes no
flush cache yes yes
overlap no
Tagged Command Queuing (TCQ) no no
Native Command Queuing (NCQ) yes 32 tags
NCQ Queue Management no
NCQ Streaming no
Receive & Send FPDMA Queued no
SMART yes yes
microcode download yes yes
security no no
power management yes yes
advanced power management no no
automatic acoustic management no no
media status notification no no
power-up in Standby yes no
write-read-verify no no
unload yes yes
general purpose logging yes yes
free-fall no no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks yes 1
DSM - deterministic read yes zeroed
Host Protected Area (HPA) no
 

chetan008

Dabbler
Joined
Jan 10, 2020
Messages
13
hi,

I can see high arc usage it could be the issue ? since we are having 1.5TB ram hence not using slog and l2arch device.

1592900256151.png
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
No, ARC consumption isn't the issue. As @HoneyBadger stated, your pool by construction is limited to the random IOPS of a single drive, and all your drives are thrashing with TRIMs. This is why it's taking over 90 seconds to complete a zvol write. All your pool drives have hit the fragmentation point of no return, and are constantly garbage collecting.

I'm afraid there's no easy fix. You'll have to shutdown your VMs so you can back up the zvols. You'll need to destroy your pool, and then secure erase your SSDs to reset them back to factory. Then you'll have to completely rebuild your pool using stripes of mirrors instead of a single large RAIDZ2 as currently. You may also want to upgrade to 11.3-U3.2, which allows you to underprovision your drives to leave additional space for TRIM to operate.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
There's a few things of concern here in your camcontrol output (and in the future, please enclose it in CODE tags to preserve spacing for readability) - see comments, with my questions in bold.

Code:
root@wwstorage-ssd[~]# camcontrol identify da1
pass1: <VK1920GFLKL HPG0> ACS-2 ATA SATA 3.x device
pass1: 150.000MB/s transfers, Command Queueing Enabled

You're only negotiating at 150MB/s (SATA1) speeds, as opposed to the full-throttle 600MB/s (SATA3) that you should be. This makes me suspect that your "HBA" still thinks it's a RAID card, or perhaps has the wrong driver loaded. It might also be a firmware compatibility issue as that looks like an HP OEM device.

If you run dmesg | grep mrsas is it loaded there, or do you instead get results from dmesg | grep mfi?

Can you confirm that you have actually forced the card into full HBA mode and are not simply passing unconfigured disks through as "Non-RAID Disk" in RAID mode?


Code:
Feature                      Support  Enabled   Value           Vendor
read ahead                     yes      yes
write cache                    yes      no
flush cache                    yes      yes

Write cache isn't enabled on your SSDs; this is typical behavior for SSDs behind a RAID card. ZFS controls the write cache on your devices and will handle flushes for stability. Again, please confirm that you have fully disabled the RAID functionality on your controller.

Code:
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks       yes              1
DSM - deterministic read       yes              zeroed
TRIM is enabled but those devices don't seem particularly efficient at it - "max 512byte blocks" is only 1. But again, this might be the RAID/HBA mode.
 

chetan008

Dabbler
Joined
Jan 10, 2020
Messages
13
There's a few things of concern here in your camcontrol output (and in the future, please enclose it in CODE tags to preserve spacing for readability) - see comments, with my questions in bold.



You're only negotiating at 150MB/s (SATA1) speeds, as opposed to the full-throttle 600MB/s (SATA3) that you should be. This makes me suspect that your "HBA" still thinks it's a RAID card, or perhaps has the wrong driver loaded. It might also be a firmware compatibility issue as that looks like an HP OEM device.

If you run dmesg | grep mrsas is it loaded there, or do you instead get results from dmesg | grep mfi?

Can you confirm that you have actually forced the card into full HBA mode and are not simply passing unconfigured disks through as "Non-RAID Disk" in RAID mode?




Write cache isn't enabled on your SSDs; this is typical behavior for SSDs behind a RAID card. ZFS controls the write cache on your devices and will handle flushes for stability. Again, please confirm that you have fully disabled the RAID functionality on your controller.


TRIM is enabled but those devices don't seem particularly efficient at it - "max 512byte blocks" is only 1. But again, this might be the RAID/HBA mode.
There's a few things of concern here in your camcontrol output (and in the future, please enclose it in CODE tags to preserve spacing for readability) - see comments, with my questions in bold.



You're only negotiating at 150MB/s (SATA1) speeds, as opposed to the full-throttle 600MB/s (SATA3) that you should be. This makes me suspect that your "HBA" still thinks it's a RAID card, or perhaps has the wrong driver loaded. It might also be a firmware compatibility issue as that looks like an HP OEM device.

If you run dmesg | grep mrsas is it loaded there, or do you instead get results from dmesg | grep mfi?

Can you confirm that you have actually forced the card into full HBA mode and are not simply passing unconfigured disks through as "Non-RAID Disk" in RAID mode?




Write cache isn't enabled on your SSDs; this is typical behavior for SSDs behind a RAID card. ZFS controls the write cache on your devices and will handle flushes for stability. Again, please confirm that you have fully disabled the RAID functionality on your controller.


TRIM is enabled but those devices don't seem particularly efficient at it - "max 512byte blocks" is only 1. But again, this might be the RAID/HBA mode.

Dear HoneyBadger,
There are two controllers one is H730P in raid mode local drives of DELL 730XD connected. but drive within this controller configure in non -raid mode. Since in this controller we have configure OS 250GB SSD in raid one mode hence we set this controller in raid mode.
1593172297305.png


1593172445925.png

We have attached both MD1220 Jbod to H830 controller which is set in HBA mode. and disks are set to non-raid mode
1593172715824.png

1593172715824.png 1593172857175.png

i have run both the command not showing any output
root@wwstorage-ssd[~]# dmesg | grep mrsas
root@wwstorage-ssd[~]# dmesg | grep mfi
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Neither mrsas nor mfi drivers are used? It should be one of those. Can you run dmesg and find out which driver is being used by your H730P/H830 HBAs?

The H730P being in mixed-mode may not be the best situation either as the controller may still be getting involved. There being no identified driver is very odd though.
 

chetan008

Dabbler
Joined
Jan 10, 2020
Messages
13
Dear Honey Badger,

root@wwstorage-ssd[~]# zpool status -v
pool: Dell-730-mini-local
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub canceled on Tue Jun 30 12:45:05 2020
config:

NAME STATE READ WRITE CKSUM
Dell-730-mini-local DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/d0f724b5-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
gptid/d220feec-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
gptid/d3311a03-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
gptid/d455b5c1-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
gptid/d5648f97-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
gptid/d6927610-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
gptid/d7dbd1f0-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
gptid/d8e9e7d2-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
gptid/da2dc26b-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
963311291589964445 OFFLINE 0 0 0 was /dev/gptid/db3b6a52-c354-11e9-a366-1418772dfffb
gptid/dc7f005d-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
gptid/dda8bb0e-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
gptid/ded510d3-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
gptid/dffc99f1-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
gptid/e126512c-c354-11e9-a366-1418772dfffb ONLINE 0 0 0
gptid/e24aae55-c354-11e9-a366-1418772dfffb ONLINE 0 0 0

We have replace the disk but new disk not deected in freenas what could be the issue
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Last edited:

chetan008

Dabbler
Joined
Jan 10, 2020
Messages
13
Dear Samuel Tai,

I have follow same steps offline the failed disk replace the new disk but when we click on replace new disk not showing in drop down box
1593523036885.png
Raid controller is H730P Mini we have converted disk to non raid its detected in controller but not showing in freenas thats the issue.

 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Does the new drive appear when you run camcontrol devlist? Since the RAID controller is still involved, this may be something FreeNAS can only detect changes to on boot.
 
Top