Raidz2 + USB question

Status
Not open for further replies.

mo0p

Cadet
Joined
Oct 13, 2014
Messages
8
Hello,

A few weeks back i posted https://forums.freenas.org/index.php?threads/upgrade-question.24065/#post-147302 regarding upgrade possibilities.. much appreciated for all the support and ideas!

I ended up transferring all my data off my array and rebuilding a Raidz2 with 6x 3tb disks to gain double parity and get away from Raidz1.

To accommodate this, I switched to running FreeNAS off USB2.0 and it seems this change or the new array has impacted performance a little..

I know CPU will take a hit while running off USB as per IRQ requirements.. but I have yet to reach above 60% cpu during transfers..

Issue-- I am having video playback get a little choppy while i am transferring tons of data back... I will likely never be transferring 5TB worth of data back to an array again.. but still very curious to see if this is some what expected? or where i can look to try and determine what is causing the choppy play back..

Below is specs and a snip of "top" while doing a transfer and observing some chop in a video that is currently playing from the same CIFS share I am writing to. Average write speed starts at about 100MB/s then slowly drops down to about 60ish MB/s

last pid: 19619; load averages: 0.83, 0.87, 1.03 up 0+08:16:33 11:31:50
40 processes: 1 running, 38 sleeping, 1 zombie
CPU: 0.1% user, 0.0% nice, 2.0% system, 13.0% interrupt, 84.9% idle
Mem: 56M Active, 107M Inact, 14G Wired, 120M Cache, 1634M Buf, 1437M Free
ARC: 11G Total, 252M MFU, 9269M MRU, 1671M Anon, 41M Header, 133M Other
Swap: 10G Total, 80M Used, 10G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
18439 root 1 23 0 334M 15100K dmu_tx 0 7:33 7.76% smbd
19619 root 1 21 0 37572K 2196K zio->i 1 0:00 0.10% zpool
18440 root 1 20 0 330M 10860K select 2 0:57 0.00% smbd
3730 root 6 20 0 339M 98M usem 3 0:30 0.00% python2.7
8435 root 5 20 0 86680K 5052K uwait 0 0:26 0.00% istgt
5484 root 12 20 0 144M 7412K uwait 3 0:25 0.00% collectd
8020 root 1 20 0 22212K 1400K select 0 0:01 0.00% ntpd
19177 root 1 20 0 330M 11516K select 0 0:00 0.00% smbd
5702 root 1 20 0 275M 4100K select 2 0:00 0.00% smbd
4336 root 1 52 0 158M 3668K ttyin 2 0:00 0.00% python2.7
5585 root 1 20 0 12032K 992K select 1 0:00 0.00% syslogd
18938 www 1 20 0 26036K 3076K kqread 3 0:00 0.00% nginx
5698 root 1 20 0 208M 3060K select 3 0:00 0.00% nmbd
3606 root 1 31 10 18588K 620K wait 0 0:00 0.00% sh
19141 root 1 20 0 315M 9160K select 0 0:00 0.00% smbd
19369 root 1 20 0 69516K 4616K select 3 0:00 0.00% sshd

Any help would be appreciated, poked around a bit and cannot find anything specific to my issue here.


Specs:
Build - 9.2.1.8
CPU-i3-2120
Mem- 16GB corsair xms2
Nic - Intel 4 port Pro/1000 (3 port LACP link with vlan for iSCSI traffic, 1 mgmt interface)
Disk:
6x 3tb WD Red - Media CIFS share
3x 500GB WD red - iSCSI for ESXi host
8GB Patriot USB for OS
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
It's entirely possible you're hitting the IOPS limit of the zpool where your CIFS share is located. Especially if you're writing tons of little files. See if the choppy playback persists when you are in a normal usage situation.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
FreeNAS doesn't run from USB. It runs from a RAMdisk and only touches the boot drive to update settings and stuff.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
if you stream video from a different computer other than the one doing the data transfer is the video choppy? I think your bottle neck is the client machine.
 

krikboh

Patron
Joined
Sep 21, 2013
Messages
209
Is it just your os drive connected via USB or data drives also?


Sent from my iPhone using Tapatalk
 

mo0p

Cadet
Joined
Oct 13, 2014
Messages
8
Hello all, thanks for the responses..

Krikboh, it is just USB for OS, all data drives are internal HDD's connected via SATA.

swettnlow, Streaming is from HTPC, uploading content to the array is being done via desktop pc and laptop (where i stashed all my stuff to blast away old raidz1 array).

content is still being transferred so I have not yet tested a stream with absolutely nothing being transferred.. however i would assume if someone starts a file transfer while others are streaming content I will hear complaints of choppy play.. I do see the LAGG interface has about 900-1100Mbps coming in which is decent amount of traffic for my particular use case..

I also noticed.. at times the GUI is non-responsive and SSH is now all of a sudden refusing connections..

When i was running FreeNAS off a HDD, does it also just load everyting onto ramdisk? or is that just for USB option.. something is really changed but i do not know enough about the underlying system to really figure it out..
 

mo0p

Cadet
Joined
Oct 13, 2014
Messages
8
Code:
[root@storage] ~# smartctl -a -q noserial /dev/ada5
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red (AF)
Device Model:     WDC WD30EFRX-68AX9N0
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Nov 12 12:34:19 2014 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (39540) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 397) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x70bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   179   179   021    Pre-fail  Always       -       6008
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       57
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   084   084   000    Old_age   Always       -       12007
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       57
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       35
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       21
194 Temperature_Celsius     0x0022   114   109   000    Old_age   Always       -       36
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   185   000    Old_age   Always       -       17948309
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 7 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 7 occurred at disk power-on lifetime: 11940 hours (497 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 08 c0 01 40 e0  Error: IDNF at LBA = 0x004001c0 = 4194752

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 08 c0 01 40 e0 00  38d+09:02:51.476  WRITE DMA

Error 6 occurred at disk power-on lifetime: 9765 hours (406 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 47 ca 9c e5  Error: UNC at LBA = 0x059cca47 = 94161479

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 48 c9 9c e5 00  43d+04:44:02.340  READ DMA
  c8 00 00 48 c8 9c e5 00  43d+04:44:02.340  READ DMA
  c8 00 88 c0 c7 9c e5 00  43d+04:44:02.339  READ DMA
  c8 00 00 c0 c6 9c e5 00  43d+04:44:02.338  READ DMA
  c8 00 00 c0 c5 9c e5 00  43d+04:44:02.338  READ DMA

Error 5 occurred at disk power-on lifetime: 3727 hours (155 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 47 11 5a e8  Error: UNC at LBA = 0x085a1147 = 140120391

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 48 10 5a e8 00      11:47:39.433  READ DMA
  c8 00 00 48 0f 5a e8 00      11:47:39.431  READ DMA
  c8 00 00 48 0e 5a e8 00      11:47:39.430  READ DMA
  c8 00 00 48 0d 5a e8 00      11:47:39.429  READ DMA
  c8 00 00 48 0c 5a e8 00      11:47:39.429  READ DMA

Error 4 occurred at disk power-on lifetime: 2206 hours (91 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0f 97 29 e3  Error: UNC at LBA = 0x0329970f = 53057295

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 10 96 29 e3 00  22d+20:52:01.241  READ DMA
  c8 00 00 10 95 29 e3 00  22d+20:52:01.239  READ DMA
  c8 00 00 10 94 29 e3 00  22d+20:52:01.238  READ DMA
  c8 00 00 10 93 29 e3 00  22d+20:52:01.238  READ DMA
  c8 00 00 10 92 29 e3 00  22d+20:52:01.237  READ DMA

Error 3 occurred at disk power-on lifetime: 1537 hours (64 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 0f 28 bf ef  Error: UNC at LBA = 0x0fbf280f = 264185871

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 10 27 bf ef 00  33d+18:42:51.936  READ DMA
  c8 00 00 10 26 bf ef 00  33d+18:42:51.936  READ DMA
  c8 00 00 10 25 bf ef 00  33d+18:42:51.935  READ DMA
  c8 00 00 10 24 bf ef 00  33d+18:42:51.934  READ DMA
  c8 00 00 10 23 bf ef 00  33d+18:42:51.934  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     11973         -
# 2  Short offline       Interrupted (host reset)      10%     11969         -
# 3  Short offline       Interrupted (host reset)      10%     11962         -
# 4  Short offline       Interrupted (host reset)      10%     11952         -
# 5  Short offline       Completed without error       00%     11942         -
# 6  Short offline       Completed without error       00%     11933         -
# 7  Short offline       Completed without error       00%     11924         -
# 8  Short offline       Completed without error       00%     11918         -
# 9  Short offline       Completed without error       00%     11909         -
#10  Short offline       Completed without error       00%     11900         -
#11  Short offline       Completed without error       00%     11895         -
#12  Short offline       Completed without error       00%     11894         -
#13  Short offline       Completed without error       00%     11885         -
#14  Short offline       Completed without error       00%     11876         -
#15  Short offline       Completed without error       00%     11870         -
#16  Short offline       Interrupted (host reset)      10%     11861         -
#17  Short offline       Completed without error       00%     11852         -
#18  Short offline       Completed without error       00%     11846         -
#19  Short offline       Completed without error       00%     11837         -
#20  Short offline       Completed without error       00%     11828         -
#21  Short offline       Completed without error       00%     11822         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Console of device was just spewing out some pager I/O error ... rebooted and i can now get into shell and web gui..


One of my disks has checksum errors though as per below:
gptid/af66849b-697c-11e4-a41c-0015175b85ac ONLINE 0 0 1.61K ...this is ada5, one of my new Red drives.. should i begin a scrub?

I am also using a PCIe to Sata card as i ran out of Sata ports on motherboard.. anyone have bad experience with this??


Images of errors : http://imgur.com/ch8lRLR,2wpMTku#0
 
Last edited:

krikboh

Patron
Joined
Sep 21, 2013
Messages
209
Depends on the SATA card, could definitely be an issue. What model?


Sent from my iPhone using Tapatalk
 

mo0p

Cadet
Joined
Oct 13, 2014
Messages
8
its some cheap one :(
SYBA SY-PEX40039
not needing to do anything with the card..just provide more sata ports.. found https://forums.freenas.org/index.php?threads/uncorrectable-parity-crc-error.14404/page-2

Sounds like he had similar issue, was resolved by plugging sata to mobo rather than sata controller.. i do not have enough ports to do this though unless i remove my iSCSI disks which are not reallllly required but nice to have..

Guess my next question would be.. what is a decent way to get more sata ports other than buying some fancy LSI card + sas break out cables
 
Last edited:

krikboh

Patron
Joined
Sep 21, 2013
Messages
209
That's why the IBM M-1015 is recommended as the lowest cost reliable solution. I actually ordered one yesterday so I could move to a SATA DOM for 9.3.


Sent from my iPhone using Tapatalk
 

mo0p

Cadet
Joined
Oct 13, 2014
Messages
8
Thanks! so... in short.. this card with out any of its raid features + sas breakout should yeild extra 8x SATA connections correct?

Might be worth looking at ... but is there any way to tell from above output of smart ctl if it was a write error or related to disk / connection etc?
 

krikboh

Patron
Joined
Sep 21, 2013
Messages
209
Hard to read a smart report on my phone. Maybe someone else can help.


Sent from my iPhone using Tapatalk
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
UDMA CRC count is through the roof. I'd replace the cables ASAP, but I get the feeling the controller may be at fault.
 

mo0p

Cadet
Joined
Oct 13, 2014
Messages
8
Thanks for the info! i will swap cables and see.. if not.. i will place a different disk on that port and see if it starts dmping UDMA CRC failures like crazy too...

When i run a fresh smartctl -a /dev/ada5 i still see the massive amounts of errors for UDMA CRC.. is it possible to clear this number ? or do i need to just keep track of the growth of the number?

Thanks again!
 

mo0p

Cadet
Joined
Oct 13, 2014
Messages
8
Just an update, swapping cables and moved all disks for CIFS array to mobo and it is working well now, i still see check sum errors in the gui though :S but smartctl CRC error count has stabilized.

Thanks again everyone!
 
Status
Not open for further replies.
Top