Sudden drop in network speeds

Status
Not open for further replies.

olle

Dabbler
Joined
Jun 17, 2013
Messages
17
I used to take care of these kind of issues on my own, just do an all-nighter with google and eventually solve it. But this time I really need some guidence, because I have since an accident developed chronic neck pain and I just can´have long sessions in front of a computer screen.

A couple of days ago my media player started to have problems playing blu-ray rips from my freenas. I suspect it is either the network or the drives that is at fault. I have tried swithcing network cables, restarting router etc. Did not do the trick. Scrub found some corrupted file, but I deleted it, started a new scrub (which would take 200h for 5 TB??) I stopped that one and now it shows no error. Simple smartctl checks passes all drives.

The media player use NFS to mount the server. I have tested SMB to my desktop which also produce really low speeds (from 2MB/s - 20MB/s).

This is as far as my knowledge goes if I´m not to sit long sessions.

I use FreeNAS-8.3.1-RELEASE-p2-x64
AMD E-350 Processor
7790MB memory
3 WD Green disks in ZFS

Just ask if there is some more information you need.
As I said, I wold really appriciate help here due to my condition, and ask for some understanding.
 
Joined
Mar 6, 2014
Messages
686
Scrub found some corrupted file, but I deleted it, started a new scrub (which would take 200h for 5 TB??) I stopped that one and now it shows no error.
You answered your own question. You probably have a drive failing, which could also be the reason the crub would take so long.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Yes. Which SMART tests did you run.

Run SMART tests on your hard disks.

smartctl -t short /dev/ada0 (and long)

The devices are typically ada0, ... but might be da0, ... depending on how they are connected.
 

olle

Dabbler
Joined
Jun 17, 2013
Messages
17
I did smartctl -H /dev/ada0 (and ada1,ada2)

Doing smartctl -t short /dev/ada0 at the moment, will return when I have done short and long on all drives.
Thank you!
 

olle

Dabbler
Joined
Jun 17, 2013
Messages
17
The short test that is supposed to take 2 minutes still hasn´t completed...

Edit*Scratch that, I didn´t realize that the test would not show an output when done.
 
Last edited:

olle

Dabbler
Joined
Jun 17, 2013
Messages
17
I did short tests and ada0 gave this:

Code:
[root@freenas] ~# smartctl -l selftest /dev/ada0
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     19043         376789177
# 2  Extended offline    Completed: read failure       90%     19043         376789177
# 3  Short offline       Completed: read failure       90%     19043         376789176
# 4  Extended offline    Completed: read failure       90%     19042         376789177
# 5  Short offline       Completed: read failure       90%     19041         376789177
# 6  Short offline       Completed: read failure       90%     19041         376789176
# 7  Short offline       Completed: read failure       90%     19041         376789176
# 8  Short offline       Completed: read failure       10%     19041         376789176
# 9  Short offline       Completed: read failure       90%      7602         782369472
#10  Short offline       Completed: read failure       90%      7600         782369472
#11  Short offline       Completed: read failure       90%      7600         782369472
#12  Short offline       Completed: read failure       90%      7599         782369472
#13  Short offline       Completed: read failure       90%      7598         782369472
#14  Short offline       Completed: read failure       90%      7597         782369472
#15  Short offline       Completed: read failure       90%      7596         782369472
#16  Short offline       Completed: read failure       90%      7595         782369472
#17  Short offline       Completed: read failure       90%      7594         782369472
#18  Short offline       Completed: read failure       90%      7593         782369472
#19  Short offline       Completed: read failure       90%      7592         782369472
#20  Short offline       Completed: read failure       90%      7591         782369472
#21  Short offline       Completed: read failure       90%      7590         782369472


I´m currently waiting for long test on ada0 to finish
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, if the short test fail (which it did more than 10000 power-on hours ago) the disk is failed. You need to replace that disk.. simple as that.

The long test is inconsequential since the short test failed.
 

Fraoch

Patron
Joined
Aug 14, 2014
Messages
395
There's probably no point of a long test. The drive is failing. If the drive is under warranty, this failed SMART test is enough for the manufacturer to replace it for you.

Get a replacement drive ASAP. If the failed drive is under warranty, get an RMA started with them to obtain a replacement, but unless you're willing to take the risk of losing your data you may want to get a replacement drive immediately, then keep the drive the manufacturer sends back to you as a spare. Based on the number of disks you have, you're running RAIDZ1 and you won't have any redundancy when this drive is removed so another drive error will immediately cause more file corruption in the best case and the loss of the entire pool in the worst case. You don't want to run long in this configuration unless the data can be easily replaced.
 

olle

Dabbler
Joined
Jun 17, 2013
Messages
17
Oh, embarrasing

Ok I will get a replacement one way or another.
Are you positive that this is what causes the slow performance? I realize I have should prioritize the failed disk but I just want to know if I have one major problem, or two...
 

Fraoch

Patron
Joined
Aug 14, 2014
Messages
395
Are you positive that this is what causes the slow performance? I realize I have should prioritize the failed disk but I just want to know if I have one major problem, or two...

This would cause FreeNAS problems accessing the pool, which will almost surely lead to low performance. Even in the unlikely event this isn't the cause, this problem should be corrected anyway. You'll be getting more and more corrupted files and one day you may lose everything.

Any networking problems you have are minor compared to this one...

I don't know if FreeNAS 8.3.1 has this capability, but it should be configured to send you a warning e-mail in the event of a drive error. It may have caught it in time to prevent the file corruption and it may have caught it while the drive was still under warranty - based on the amount of hours, it looks like it's just out of warranty now. :(
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
We've found that ZFS performance takes a nose dive when drives start failing.

I imagine Greens (and others without TLER) are the worst, since they'll try to read data (nearly) forever before giving up, leaving ZFS waiting.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Drives failing will kill ZFS performance. Regardless of whether that is the actual problem, I don't know. But we have to rule out the disk being responsible first.. so that needs to be replaced before any further troubleshooting occurs.
 

olle

Dabbler
Joined
Jun 17, 2013
Messages
17
Ok, I will go buy a new drive.
I guess most people have a way of knowing which physical drive is which, but do you have any hint for me to know which drive a should remove (after putting in in offline mode ofc).

Code:
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format)
Device Model:     WDC WD20EARX-00PASB0
Serial Number:    WD-WCAZAC372042
LU WWN Device Id: 5 0014ee 25bb9b635
Firmware Version: 51.0AB51
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Tue Jan 27 11:28:38 2015 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                (39900) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off supp                                             ort.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 384) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_                                             FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   197   197   051    Pre-fail  Always       -                                                    31036
  3 Spin_Up_Time            0x0027   162   160   021    Pre-fail  Always       -                                                    6858
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -                                                    469
  5 Reallocated_Sector_Ct   0x0033   178   178   140    Pre-fail  Always       -                                                    429
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -                                                    0
  9 Power_On_Hours          0x0032   074   074   000    Old_age   Always       -                                                    19062
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -                                                    0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -                                                    0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -                                                    412
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -                                                    329
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -                                                    1335262
194 Temperature_Celsius     0x0022   109   095   000    Old_age   Always       -                                                    41
196 Reallocated_Event_Count 0x0032   009   009   000    Old_age   Always       -                                                    191
197 Current_Pending_Sector  0x0032   198   198   000    Old_age   Always       -                                                    676
198 Offline_Uncorrectable   0x0030   199   198   000    Old_age   Offline      -                                                    328
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -                                                    0
200 Multi_Zone_Error_Rate   0x0008   126   126   000    Old_age   Offline      -                                                    19821

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA                                             _of_first_error
# 1  Extended offline    Completed: read failure       90%     19043         376                                             789177
# 2  Short offline       Completed: read failure       90%     19043         376                                             789177
# 3  Extended offline    Completed: read failure       90%     19043         376                                             789177
# 4  Short offline       Completed: read failure       90%     19043         376                                             789176
# 5  Extended offline    Completed: read failure       90%     19042         376                                             789177
# 6  Short offline       Completed: read failure       90%     19041         376                                             789177
# 7  Short offline       Completed: read failure       90%     19041         376                                             789176
# 8  Short offline       Completed: read failure       90%     19041         376                                             789176
# 9  Short offline       Completed: read failure       10%     19041         376                                             789176
#10  Short offline       Completed: read failure       90%      7602         782                                             369472
#11  Short offline       Completed: read failure       90%      7600         782                                             369472
#12  Short offline       Completed: read failure       90%      7600         782                                             369472
#13  Short offline       Completed: read failure       90%      7599         782                                             369472
#14  Short offline       Completed: read failure       90%      7598         782                                             369472
#15  Short offline       Completed: read failure       90%      7597         782                                             369472
#16  Short offline       Completed: read failure       90%      7596         782                                             369472
#17  Short offline       Completed: read failure       90%      7595         782                                             369472
#18  Short offline       Completed: read failure       90%      7594         782                                             369472
#19  Short offline       Completed: read failure       90%      7593         782                                             369472
#20  Short offline       Completed: read failure       90%      7592         782                                             369472
#21  Short offline       Completed: read failure       90%      7591         782                                             369472

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

This is the result of the long test:
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
The one with this serial number: WD-WCAZAC372042

In the future, you might want to put a label with the serial number on the visible end of the hard disk, to help you locate the right drive in the future.
 

enemy85

Guru
Joined
Jun 10, 2011
Messages
757
To have a better understanding of why your disk is failing, just read this values

5 Reallocated_Sector_Ct 0x0033 178 178 140 Pre-fail Always - 429

193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 1335262

197 Current_Pending_Sector 0x0032 198 198 000 Old_age Always - 676

198 Offline_Uncorrectable 0x0030 199 198 000 Old_age Offline - 328
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The one with this serial number: WD-WCAZAC372042

In the future, you might want to put a label with the serial number on the visible end of the hard disk, to help you locate the right drive in the future.

In most non-hotswap enclosures, it's possible to read the end label which has the serial number.
 

olle

Dabbler
Joined
Jun 17, 2013
Messages
17
To have a better understanding of why your disk is failing, just read this values

That does not really tell me much...

Either way, the volume is now silvering.
The system has since a year back made an unnerving sound, I have just tried to ignore it and hoped it was the fan. Now when I replaced the failing disk, I hoped to not hear that sound, but it is still there. I took the opportinity to check the fan when I opened the case but it seems to have no physical resistance. The sound is like a loud humming with some ticking. I fear it is one of the other drives, what actions should I take when silvering is done? Scrubing and long test every drive?

In most non-hotswap enclosures, it's possible to read the end label which has the serial number.

Yes the serial number was visible. Thanks gpsguy!
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
We can tell that you have a "green drive" just by seeing this line:

193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 1335262

You should have run "wdidle" on the drive before putting it in use.

Please post the smartctl output of the other 2 drives in code tags, so we can view their health.
 

olle

Dabbler
Joined
Jun 17, 2013
Messages
17
Damn, I had no idea about wdidle. I just bought another green drive!

Can I do smartctl while volume is silvering?
 
Status
Not open for further replies.
Top