NFSv4 server doesn't response anytime

Status
Not open for further replies.

kimia

Cadet
Joined
Feb 27, 2015
Messages
8
Hi group,

We have a Supermicro baremetal with Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 64Gb of Ram and ZFS storage with three pools, in detail:

Code:
nas1# lspci
...
2:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
03:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
04:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01)
05:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
05:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)


Code:
nas1# camcontrol devlist
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 0 lun 0 (da0,pass0)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 1 lun 0 (da1,pass1)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 2 lun 0 (da2,pass2)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 3 lun 0 (da3,pass3)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 4 lun 0 (da4,pass4)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 5 lun 0 (da5,pass5)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 6 lun 0 (da6,pass6)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 7 lun 0 (da7,pass7)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 8 lun 0 (da8,pass8)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 9 lun 0 (da9,pass9)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 10 lun 0 (da10,pass10)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 11 lun 0 (da11,pass11)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 12 lun 0 (da12,pass12)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 13 lun 0 (da13,pass13)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 14 lun 0 (da14,pass14)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 15 lun 0 (da15,pass15)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 16 lun 0 (da16,pass16)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 17 lun 0 (da17,pass17)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 18 lun 0 (da18,pass18)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 19 lun 0 (da19,pass19)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 20 lun 0 (da20,pass20)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 21 lun 0 (da21,pass21)
<AMCC 9650SE-24M DISK 4.10>        at scbus0 target 22 lun 0 (da22,pass22)
<SAMSUNG MZ7WD120HAFV-00003 DXM87W3Q>  at scbus1 target 0 lun 0 (ada0,pass23)
<SAMSUNG MZ7WD120HAFV-00003 DXM87W3Q>  at scbus2 target 0 lun 0 (ada1,pass24)


Code:
nas1# zpool list
NAME           SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
Volume0       5.44T   658G  4.79T         -     2%    11%  1.00x  ONLINE  /mnt
Volume1       2.72T   916K  2.72T         -     0%     0%  1.00x  ONLINE  /mnt
Volume2       13.6T  7.88T  5.71T         -    21%    57%  1.00x  ONLINE  /mnt
freenas-boot   111G  1.50G   109G         -      -     1%  1.00x  ONLINE  -


Code:
nas1# zpool status Volumen0
cannot open 'Volumen0': no such pool
nas1# zpool status Volume0
  pool: Volume0
state: ONLINE
  scan: scrub repaired 0 in 1h40m with 0 errors on Sun Feb  8 01:40:48 2015
config:

        NAME                                            STATE     READ WRITE CKSUM
        Volume0                                         ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/7a8e031c-336c-11e4-91df-0025902eda9c  ONLINE       0     0     0
            gptid/7c57b51f-336c-11e4-91df-0025902eda9c  ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            gptid/7d89cdcf-336c-11e4-91df-0025902eda9c  ONLINE       0     0     0
            gptid/7eba40e0-336c-11e4-91df-0025902eda9c  ONLINE       0     0     0

errors: No known data errors


We are suffering sometimes problems with NFS clients which are an elasticsearch cluster. The problem happens with the backups, this is done with a curl from any node of the cluster, in this case three nodes, the situations is that when the backup starts, the clients (elasticsearch nodes) cant access to NFS server, they cant access, it is like the NFS server would be blocked. This is what the clients is telling:

Code:
Feb 27 07:01:22 kernel: [7409875.902785] nfs: server not responding, still trying
Feb 27 07:01:23 kernel: [7409876.715406] nfs: server OK
Feb 27 07:01:24 kernel: [7409877.014304] nfs: server not responding, still trying
Feb 27 07:01:24 kernel: [7409877.452476] nfs: server OK
Feb 27 07:01:24 kernel: [7409877.650035] nfs: server not responding, still trying
Feb 27 07:01:24 kernel: [7409877.756992] nfs: server OK
Feb 27 07:01:27 kernel: [7409880.480818] nfs: server not responding, still trying


For what i was reading, this could be happened because the clients cant synch with server, i think this is normal if we are writing a hundred of GB, but it is a dozen of GB because the elasticsearch's backup is incremental.

We can control the bites for sec that the clients sends to the NFS server and the chunk size of packages send it, this is what we have done two times already. So, i would to know if someone has happened this situations, if its normal or how to dig into this situation.

Cheers.
 

kimia

Cadet
Joined
Feb 27, 2015
Messages
8
I forgot to paste some test:
Code:
nas1# iozone -a -s 24g -r 4096
        Iozone: Performance Test of File I/O
                Version $Revision: 3.420 $
                Compiled for 64 bit mode.
                Build: freebsd

        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
                     Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
                     Vangel Bojaxhi, Ben England, Vikentsi Lapa.

        Run began: Wed Mar  4 10:55:31 2015

        Auto Mode
        File size set to 25165824 KB
        Record Size 4096 KB
        Command line used: iozone -a -s 24g -r 4096
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
        25165824    4096 1842606 2084398  5077793  5085736 5008946 2181345 4742047  7024324  5037538  2092264  1933465 3074501  3093942

iozone test complete.
nas1# ls


Cheers
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Code:
04:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01)
Wow. Like living life on the edge eh? You haven't read the posts about my friend having one of those controllers did you? I'll let you read up on it, then I'll forgive you while you immediately order a proper card for your system.
 

kimia

Cadet
Joined
Feb 27, 2015
Messages
8
:).. Thank for the fast answer. I'll order to change the card inmediatly, but could you pass me the link to that post, please.?

Cheers
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
:).. Thank for the fast answer. I'll order to change the card inmediatly, but could you pass me the link to that post, please.?

Honestly, it's not even worth my effort. It's not on any of our hardware recommendations lists and that should have been enough evidence to question if it was appropriate. If I searched for my own posts every time someone asked, I'd be unemployed because I'd spend 24 hours a day doing just that.
 

kimia

Cadet
Joined
Feb 27, 2015
Messages
8
Sorry, you are rigth! anyway thanks for point to me on the correct way, i will change the 3ware controller for M1015.

Cheers
 
Status
Not open for further replies.
Top