You'd best Belive it Because what they say is true

Status
Not open for further replies.

ultimateon

Dabbler
Joined
Oct 22, 2014
Messages
14
So after 3 years of running FreeNAS on this machine with the current setup I have finally started running into massive amounts of problems.
First off let me post the builds to my FrankenNAS

Original

HP Microserver N54L
4 Gigs of ECC ram
1x1 TB WD Green(Pre Intellipark)
1x2 TB WD Green (Intellipark Enabled)
1x3 TB WD Green (Intellipark Enabled)
Kingstone dataTraveler 16 GB(Boot Drive)

Current

HP Microserver N54L
8 Gigs of non ECC ram (Gskill 1600 MHz ripjaws Z) (need to find another Cheap compatible stick of 4 gigs of ECC ram DAMM the cursed economy now with the price of the dólar skyrocketing everything is expensive!)
1x1 TB WD Green(Pre Intellipark) (Still working)
1x2 TB WD Green (Intellipark Disabled) (Died last month ago along with the dataset and 320 GB of somone else's life I said I might lose!)
1x3 TB WD Green (Intellipark Enabled)(Died before disabling intellipark)
1x4 TB WD Red
1x1 TB Seagate Samsung 2.5 pulled of a Portable Hard drive adaptor (Soon to be replaced by a Toshiba Store-E HDD which has yet to been identified)(Sectors dead ATM 46 and increasing)
1x 320 GB Seagate Samsung Drive( Working as a cache drive)
Kingstone dataTraveler 16 GB(Boot Drive)

So I powered the system off Last night because I had to do some cleaning around it and in the morning after a VERY slow boot I get this warning

"The volume home (ZFS) state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected."

So I need to know If this is either a Hard drive failiure (forcing me to move away from FreeNAS for a time until I can aquire 2 more 4 TB reds and a cheap SSD (Damm this economy) ). Or is it a RAM error (Which I will test them further along this afternoon with memtest after removi- no wait I just made this dataset a week ago besides a few movies i'm not losing anything important).

I'm uploading the Debug Logs anything else let me know I'm watching this thread, seeing If I can Pump some more lifeblood into the Diying machine.

And now for some comical relief
IfF5CMP.jpg
 

Attachments

  • debug-freenas-20150815111448.tgz
    374.8 KB · Views: 233

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Please run: zpool status from the console and upload the results in code tags (or just take a picture) .
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
From the OP's other thread - https://forums.freenas.org/index.php?threads/n54l-sata-riser-icydock.35962/#post-221728 we know that he can't/doesn't want to replace the hardware. I suggested that he replace the current hardware, since it's underpowered and limits what he wants to accomplish.

The 320 GB Seagate Samsung Drive ("cache drive") is inappropriate for your system. First, you have too far little RAM for any kind of "cache drive". Second, to be of help, it would need to be a SSD.

The memory requirements for ZFS on FreeNAS, even back in the 8.0 days was an 8GB minimum. Running with just 4GB was dangerous. One can put 16GB ECC RAM in a N54L (I did).

It sounds like you aren't using any form of RAIDz or mirroring, just independent drives, each with their own volume. Most of us are using ZFS (software raid) to give us some protection against drive failure and to ease the replacement of a drive. Note RAID, isn't a substitute for usable backups.

The message "The volume home (ZFS) state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected." refers to a disk issue, possibly the one with the bad sectors. The zpool status should reveal the problematic drive. Running SMART tests on your drives - smartctl -a /dev/adaX (where X) is a number, will give us more information about the health of your drives.
 

ultimateon

Dabbler
Joined
Oct 22, 2014
Messages
14
For zpool status
Code:
[root@freenas] ~# zpool status  
pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Fri Jul 17 11:45:46 2015
config:

  NAME  STATE  READ WRITE CKSUM
  freenas-boot  ONLINE  0  0  0
  da0p2  ONLINE  0  0  0

errors: No known data errors

  pool: home
state: ONLINE
status: One or more devices has experienced an error resulting in data
  corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
  entire pool from backup.
  see: http://illumos.org/msg/ZFS-8000-8A
  scan: none requested
config:

  NAME  STATE  READ WRITE CKSUM
  home  ONLINE  0  0  0
  gptid/8b93b172-3b84-11e5-8174-9cb65407e523  ONLINE  0  0  0
  gptid/a8def395-3b84-11e5-8174-9cb65407e523  ONLINE  0  0  0
  gptid/aa95ed1d-3b84-11e5-8174-9cb65407e523  ONLINE  0  0  0
  cache
  gptid/ab7389bd-3b84-11e5-8174-9cb65407e523  ONLINE  0  0  0

errors: 27 data errors, use '-v' for a list
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
The 320 GB Seagate Samsung Drive ("cache drive") is inappropriate for your system. First, you have too far little RAM for any kind of "cache drive". Second, to be of help, it would need to be a SSD.
I agree, dump the cache drive, it's just never going to help you out and I believe it will actually add delay.
 

ultimateon

Dabbler
Joined
Oct 22, 2014
Messages
14
ada0
Code:
[root@freenas] ~# smartctl -a /dev/ada0
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p16 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Western Digital Green
Device Model:  WDC WD10EARX-00N0YB0
Serial Number:  WD-WMC0S0068596
LU WWN Device Id: 5 0014ee 2069025ac
Firmware Version: 51.0AB51
User Capacity:  1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:  Sat Aug 15 15:12:35 2015 WEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
  was completed without error.
  Auto Offline Data Collection: Enabled.
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever
  been run.
Total time to complete Offline
data collection:  (17280) seconds.
Offline data collection
capabilities:  (0x7b) SMART execute Offline immediate.
  Auto Offline data collection on/off supp  ort.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  2) minutes.
Extended self-test routine
recommended polling time:  ( 170) minutes.
Conveyance self-test routine
recommended polling time:  (  5) minutes.
SCT capabilities:  (0x30b5) SCT Status supported.
  SCT Feature Control supported.
  SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_  FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  0
  3 Spin_Up_Time  0x0027  115  104  021  Pre-fail  Always  -  7250
  4 Start_Stop_Count  0x0032  098  098  000  Old_age  Always  -  2294
  5 Reallocated_Sector_Ct  0x0033  200  200  140  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  200  200  000  Old_age  Always  -  0
  9 Power_On_Hours  0x0032  085  085  000  Old_age  Always  -  11252
10 Spin_Retry_Count  0x0032  100  100  000  Old_age  Always  -  0
11 Calibration_Retry_Count 0x0032  100  100  000  Old_age  Always  -  0
12 Power_Cycle_Count  0x0032  099  099  000  Old_age  Always  -  1971
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  701
193 Load_Cycle_Count  0x0032  185  185  000  Old_age  Always  -  45749
194 Temperature_Celsius  0x0022  114  096  000  Old_age  Always  -  33
196 Reallocated_Event_Count 0x0032  200  200  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  200  200  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x0008  200  200  000  Old_age  Offline  -  0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA  _of_first_error
# 1  Short offline  Completed without error  00%  10544  -
# 2  Short offline  Aborted by host  90%  7389  -
# 3  Short offline  Aborted by host  90%  7389  -
# 4  Short offline  Aborted by host  90%  7389  -
# 5  Short offline  Aborted by host  90%  7389  -
# 6  Short offline  Aborted by host  90%  7389  -
# 7  Short offline  Aborted by host  90%  7349  -
# 8  Short offline  Aborted by host  90%  7349  -
# 9  Short offline  Aborted by host  90%  7349  -
#10  Short offline  Aborted by host  90%  7349  -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


ada1
Code:
[root@freenas] ~# smartctl -a /dev/ada1
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p16 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Western Digital Red
Device Model:  WDC WD40EFRX-68WT0N0
Serial Number:  WD-WCC4E7HHC9SU
LU WWN Device Id: 5 0014ee 20b3fb28c
Firmware Version: 82.00A82
User Capacity:  4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  5400 rpm
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:  Sat Aug 15 15:14:08 2015 WEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
  was never started.
  Auto Offline Data Collection: Disabled.
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever
  been run.
Total time to complete Offline
data collection:  (52380) seconds.
Offline data collection
capabilities:  (0x7b) SMART execute Offline immediate.
  Auto Offline data collection on/off support.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  2) minutes.
Extended self-test routine
recommended polling time:  ( 524) minutes.
Conveyance self-test routine
recommended polling time:  (  5) minutes.
SCT capabilities:  (0x703d) SCT Status supported.
  SCT Error Recovery Control supported.
  SCT Feature Control supported.
  SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  0
  3 Spin_Up_Time  0x0027  176  171  021  Pre-fail  Always  -  8183
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  248
  5 Reallocated_Sector_Ct  0x0033  200  200  140  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  100  253  000  Old_age  Always  -  0
  9 Power_On_Hours  0x0032  097  097  000  Old_age  Always  -  2467
10 Spin_Retry_Count  0x0032  100  100  000  Old_age  Always  -  0
11 Calibration_Retry_Count 0x0032  100  100  000  Old_age  Always  -  0
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  106
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  59
193 Load_Cycle_Count  0x0032  200  200  000  Old_age  Always  -  619
194 Temperature_Celsius  0x0022  119  112  000  Old_age  Always  -  33
196 Reallocated_Event_Count 0x0032  200  200  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  100  253  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x0008  100  253  000  Old_age  Offline  -  0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


ada2(problematic seagate drive)
Code:
[root@freenas] ~# smartctl -a /dev/ada2
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p16 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Seagate Samsung SpinPoint M8 (AF)
Device Model:  ST1000LM024 HN-M101MBB
Serial Number:  S2TPJAGCA13421
LU WWN Device Id: 5 0004cf 208c66368
Firmware Version: 2AR10001
User Capacity:  1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  5400 rpm
Form Factor:  2.5 inches
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:  Sat Aug 15 15:15:12 2015 WEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
  was never started.
  Auto Offline Data Collection: Disabled.
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever
  been run.
Total time to complete Offline
data collection:  (13140) seconds.
Offline data collection
capabilities:  (0x5b) SMART execute Offline immediate.
  Auto Offline data collection on/off support.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  No Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  2) minutes.
Extended self-test routine
recommended polling time:  ( 219) minutes.
SCT capabilities:  (0x003f) SCT Status supported.
  SCT Error Recovery Control supported.
  SCT Feature Control supported.
  SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  100  100  051  Pre-fail  Always  -  15930
  2 Throughput_Performance  0x0026  252  252  000  Old_age  Always  -  0
  3 Spin_Up_Time  0x0023  086  086  025  Pre-fail  Always  -  4460
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  926
  5 Reallocated_Sector_Ct  0x0033  252  252  010  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  252  252  051  Old_age  Always  -  0
  8 Seek_Time_Performance  0x0024  252  252  015  Old_age  Offline  -  0
  9 Power_On_Hours  0x0032  100  100  000  Old_age  Always  -  1942
10 Spin_Retry_Count  0x0032  252  252  051  Old_age  Always  -  0
11 Calibration_Retry_Count 0x0032  100  100  000  Old_age  Always  -  43
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  946
191 G-Sense_Error_Rate  0x0022  100  100  000  Old_age  Always  -  17
192 Power-Off_Retract_Count 0x0022  252  252  000  Old_age  Always  -  0
194 Temperature_Celsius  0x0002  064  049  000  Old_age  Always  -  33 (Min/Max 14/52)
195 Hardware_ECC_Recovered  0x003a  100  100  000  Old_age  Always  -  0
196 Reallocated_Event_Count 0x0032  252  252  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  100  100  000  Old_age  Always  -  56
198 Offline_Uncorrectable  0x0030  252  252  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0036  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x002a  100  100  000  Old_age  Always  -  1333
223 Load_Retry_Count  0x0032  100  100  000  Old_age  Always  -  43
225 Load_Cycle_Count  0x0032  094  094  000  Old_age  Always  -  62606

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Completed [00% left] (0-65535)
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ada3 (cache)
Code:
[root@freenas] ~# smartctl -a /dev/ada3
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p16 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  SAMSUNG SpinPoint M7E (AF)
Device Model:  SAMSUNG HM321HI
Serial Number:  S24PJDRZ239294
LU WWN Device Id: 5 0024e9 00305e71f
Firmware Version: 2AJ10001
User Capacity:  320,072,933,376 bytes [320 GB]
Sector Size:  512 bytes logical/physical
Rotation Rate:  5400 rpm
Form Factor:  2.5 inches
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:  Sat Aug 15 15:19:46 2015 WEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
  was never started.
  Auto Offline Data Collection: Disabled.
Self-test execution status:  (  25) The self-test routine was aborted by
  the host.
Total time to complete Offline
data collection:  ( 5520) seconds.
Offline data collection
capabilities:  (0x5b) SMART execute Offline immediate.
  Auto Offline data collection on/off support.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  No Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  2) minutes.
Extended self-test routine
recommended polling time:  (  92) minutes.
SCT capabilities:  (0x003f) SCT Status supported.
  SCT Error Recovery Control supported.
  SCT Feature Control supported.
  SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  100  100  051  Pre-fail  Always  -  43
  2 Throughput_Performance  0x0026  252  252  000  Old_age  Always  -  0
  3 Spin_Up_Time  0x0023  092  069  025  Pre-fail  Always  -  2700
  4 Start_Stop_Count  0x0032  098  098  000  Old_age  Always  -  2630
  5 Reallocated_Sector_Ct  0x0033  252  252  010  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  252  252  051  Old_age  Always  -  0
  8 Seek_Time_Performance  0x0024  252  252  015  Old_age  Offline  -  0
  9 Power_On_Hours  0x0032  100  100  000  Old_age  Always  -  8685
10 Spin_Retry_Count  0x0032  252  252  051  Old_age  Always  -  0
11 Calibration_Retry_Count 0x0032  098  098  000  Old_age  Always  -  2995
12 Power_Cycle_Count  0x0032  098  098  000  Old_age  Always  -  3015
191 G-Sense_Error_Rate  0x0022  100  100  000  Old_age  Always  -  2767
192 Power-Off_Retract_Count 0x0022  252  252  000  Old_age  Always  -  0
194 Temperature_Celsius  0x0002  064  048  000  Old_age  Always  -  30 (Min/Max 10/54)
195 Hardware_ECC_Recovered  0x003a  100  100  000  Old_age  Always  -  0
196 Reallocated_Event_Count 0x0032  252  252  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  252  100  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  252  252  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0036  100  100  000  Old_age  Always  -  1
200 Multi_Zone_Error_Rate  0x002a  100  100  000  Old_age  Always  -  2631
223 Load_Retry_Count  0x0032  098  098  000  Old_age  Always  -  2995
225 Load_Cycle_Count  0x0032  080  080  000  Old_age  Always  -  203176

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline  Aborted by host  90%  4836  -
# 2  Short offline  Aborted by host  90%  4836  -
# 3  Short offline  Aborted by host  90%  4836  -
# 4  Short offline  Aborted by host  90%  4836  -
# 5  Short offline  Aborted by host  90%  4836  -
# 6  Short offline  Aborted by host  90%  4797  -
# 7  Short offline  Aborted by host  90%  4796  -
# 8  Short offline  Aborted by host  90%  4796  -
# 9  Short offline  Aborted by host  90%  4796  -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Aborted_by_host [90% left] (0-65535)
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



I agree, dump the cache drive, it's just never going to help you out and I believe it will actually add delay.
Guess Ill replace with it the SSD I have on one of my older laptops...
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Do not put put a cache drive in your system. Here's an except from section 1.4 of the documentation.

"However, adding an L2ARC is not a substitute for insufficient RAM as L2ARC needs RAM in order to function. If you do not have enough RAM for a good sized ARC, you will not be increasing performance, and in most cases you will actually hurt performance and could potentially cause system instability."
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Can you re-run this command: zpool status -v and post the results again. It'll show us the list of the bad files. Hopefully, they are recognizable filenames and not metadata.
 

ultimateon

Dabbler
Joined
Oct 22, 2014
Messages
14
Do not put put a cache drive in your system. Here's an except from section 1.4 of the documentation.

"However, adding an L2ARC is not a substitute for insufficient RAM as L2ARC needs RAM in order to function. If you do not have enough RAM for a good sized ARC, you will not be increasing performance, and in most cases you will actually hurt performance and could potentially cause system instability."

Alright removing it through the GUI... Once its done ill the command zpool status -v
 

ultimateon

Dabbler
Joined
Oct 22, 2014
Messages
14
Code:
[root@freenas] ~# zpool status -v
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Fri Jul 17 11:45:46 2015
config:

  NAME  STATE  READ WRITE CKSUM
  freenas-boot  ONLINE  0  0  0
  da0p2  ONLINE  0  0  0

errors: No known data errors

  pool: home
 state: ONLINE
status: One or more devices has experienced an error resulting in data
  corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
  entire pool from backup.
  see: http://illumos.org/msg/ZFS-8000-8A
  scan: none requested
config:

  NAME  STATE  READ WRITE CKSUM
  home  ONLINE  0  0  0
  gptid/8b93b172-3b84-11e5-8174-9cb65407e523  ONLINE  0  0  0
  gptid/a8def395-3b84-11e5-8174-9cb65407e523  ONLINE  0  0  0
  gptid/aa95ed1d-3b84-11e5-8174-9cb65407e523  ONLINE  0  0  0

errors: Permanent errors have been detected in the following files:

  /mnt/home/Storage/Storage.Plugins/Torrents/Django Desencadenado [BluRay  1080p][AC3 5.1 Castellano DTS Englsih+Subs.ES-EN][2013]/Django.1080p - www.newpc  t.com.mkv.part
  /mnt/home/Storage/Storage.Plugins/Torrents/The.Intouchables.2011.LIMITED  .DVDRip.XviD-VH-PROD[rarbg]/CD1/intouch.cd1-vh-prod.avi.part
  /mnt/home/Storage/Storage.Plugins/Torrents/Oldboy.2003.1080p.BluRay.x264  -FSiHD/fsi-oldboy.1080p.mkv.part
  /mnt/home/Storage/Storage.Plugins/Torrents/Mad Max Fury Road 2015 1080p  WEB-DL x264 AC3-JYK/Mad Max Fury Road 2015 1080p WEB-DL x264 AC3-JYK.mkv.part
  /mnt/home/Storage/Storage.Plugins/Torrents/Far from the Madding Crowd (2  015) NL/Far.from.the.Madding.Crowd.2015.1080p.BluRay.x264.YIFY.mp4.part
  /mnt/home/Storage/Storage.Plugins/Torrents/The Departed (2006)/The.Depar  ted.2006.720p.BrRip.x264.YIFY.mp4.part
  /mnt/home/Storage/Storage.Plugins/Torrents/Inglourious Bastards (2009) [  1080p]/Inglourious Bastards.2009.1080p.BrRip.x264.YIFY.mp4.part
  /mnt/home/Storage/Storage.Plugins/Torrents/Ip Man 2008 1080p.BluRay.5.1.  x264 . NVEE/Ip Man 2008.824p.BluRay.5.1.x264 . NVEE.mp4.part
  /mnt/home/Storage/Storage.Plugins/Torrents/Batman Begins [BDremux 1080p]  [AC3 5.1 Castellano DTS-5.1-Ingles+Subs][ES-EN]/BatmanBeginsBDR1080.www.newpct.c  om.mkv.part
  /mnt/home/Storage/Storage.Plugins/Torrents/Kingsman.The.Secret.Service.2  014.1080p.BluRay.x264-SPARKS[rarbg]/Kingsman.The.Secret.Service.2014.1080p.BluRa  y.x264-SPARKS.mkv
  /mnt/home/Storage/Storage.Plugins/Torrents/Up (2009) [1080p]/Up.2009.108  0p.BluRay.x264.YIFY.mp4.part
  /mnt/home/Storage/Storage.Plugins/Torrents/Whiplash (2014)/Whiplash.2014  .720p.BluRay.x264.YIFY.mp4
  /mnt/home/Storage/Storage.Plugins/Torrents/Run All Night (2015)/Run.All.  Night.2015.720p.BluRay.x264.YIFY.mp4.part
  /mnt/home/Storage/Storage.Plugins/Torrents/Pans.Labyrinth.(El.Laberinto.  del.Fauno).2006.US.Bluray.1080p.DTS-HD-7.1.x264-Grym/Pans.Labyrinth.(El.Laberint  o.del.Fauno).2006.US.Bluray.1080p.DTS-HD-7.1.x264-Grym.mkv.part
  /mnt/home/Storage/Storage.Plugins/Torrents/The.Wolf.of.Wall.Street.2013.  1080p.BluRay.X264-AMIABLE [PublicHD]/The.Wolf.of.Wall.Street.2013.1080p.BluRay.X  264-AMIABLE.mkv.part
  /mnt/home/Storage/Storage.Plugins/Torrents/X-Men Days of Future Past - T  he Rogue Extended Cut (2014) 1080p ENG-ITA-Comm MultiSub x264 BluRay - Giorni di  un Futuro Passato -Shiv@.mkv.part



Loads of files but nothing important lost, I'm mostly just wanting to track down whats causing the problems and not what was lost though...
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
You have a number of problems:
  • Your old system was simply inadequate--as mentioned above, the minimum RAM requirement for ZFS has been 8 GB for a long time.
  • Your new system is only barely adequate, and lacks ECC. For 9.3, 8 GB really is a bare minimum.
  • The cache drive was doing nothing at all for you (and almost certainly hurting performance), but it's now removed, so longer an issue.
  • You've never run a long SMART self-test on any of your drives.
  • You've never run a SMART self-test of any kind on ada1 or ada2.
  • ada0 and ada2 are showing awfully high load cycle counts.
  • ada2 is showing quite a few bad sectors (which you've seen already), and also has a history if getting too hot.
  • And finally, and most significantly, you have no redundancy on your pool--you just have three disks striped together. When (not if) one of your disks fails, you will irretrievably lose all your data.
So, here are some recommendations:
  • Fix the RAM situation. Rather than another 4 GB stick of ECC, I'd strongly recommend an 8 GB stick. That will give you 12 GB, which is reasonable for 9.3.
  • Run long SMART self-tests on all of your drives. From the CLI, as the root user, do 'smartctl -t long /dev/adaX', replacing X with 0, 1, and 2. You can run them simultaneously. Expect them to take several hours to complete. Post the 'smartctl -a' output again once they've finished.
  • Schedule SMART tests for all your drives. I'd recommend short tests every 1-3 days and long tests every 1-3 weeks.
  • Research WDIDLE3.EXE and run it on ada0 (I believe it only works on Western Digital disks, so won't do anything for ada2).
  • You can, and probably should, replace ada2 while it's still in the pool, but that still leaves you with no redundancy.
  • Ideally, recreate the pool in a redundant configuration (even RAIDZ1 would be better than you have now), and move the data there.
Edit: I also see you've never run a scrub on your pool. You should change that as well, though wait until the long SMART tests finish. Then schedule a scrub every 2-4 weeks.
 
Last edited:

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
^^ what he said. I had to run a couple of errands and was in the process of composing a similar response.
 

ultimateon

Dabbler
Joined
Oct 22, 2014
Messages
14
You have a number of problems:
  • Your old system was simply inadequate--as mentioned above, the minimum RAM requirement for ZFS has been 8 GB for a long time.
  • Your new system is only barely adequate, and lacks ECC. For 9.3, 8 GB really is a bare minimum.
  • The cache drive was doing nothing at all for you (and almost certainly hurting performance), but it's now removed, so longer an issue.
  • You've never run a long SMART self-test on any of your drives.
  • You've never run a SMART self-test of any kind on ada1 or ada2.
  • ada0 and ada2 are showing awfully high load cycle counts.
  • ada2 is showing quite a few bad sectors (which you've seen already), and also has a history if getting too hot.
  • And finally, and most significantly, you have no redundancy on your pool--you just have three disks striped together. When (not if) one of your disks fails, you will irretrievably lose all your data.
So, here are some recommendations:
  • Fix the RAM situation. Rather than another 4 GB stick of ECC, I'd strongly recommend an 8 GB stick. That will give you 12 GB, which is reasonable for 9.3.
  • Run long SMART self-tests on all of your drives. From the CLI, as the root user, do 'smartctl -t long /dev/adaX', replacing X with 0, 1, and 2. You can run them simultaneously. Expect them to take several hours to complete. Post the 'smartctl -a' output again once they've finished.
  • Schedule SMART tests for all your drives. I'd recommend short tests every 1-3 days and long tests every 1-3 weeks.
  • Research WDIDLE3.EXE and run it on ada0 (I believe it only works on Western Digital disks, so won't do anything for ada2).
  • You can, and probably should, replace ada2 while it's still in the pool, but that still leaves you with no redundancy.
  • Ideally, recreate the pool in a redundant configuration (even RAIDZ1 would be better than you have now), and move the data there.
Edit: I also see you've never run a scrub on your pool. You should change that as well, though wait until the long SMART tests finish. Then schedule a scrub every 2-4 weeks.

Guess ill wipe the data-set no point in continuing allowing it to running with problems.
Replace ADA2 with another generic 1tb Seagate/Toshiba (dunno till I open it) drive I have around here.
As for the ram Currently running the 8 GB's in 4x2 and can't upgrade atm so between running it at extreemly low 4 gb ecc or running it at 8 non ecc for the next few months until christmas well ill take more ram.
Which also brings the following to question I also happen to have a 120 GB Kingston Now 300 SSD that I can replace as the cache drive should I use it or not?
Should I really run Wdidle3 on the RED drive as well? I'm pretty sure the Green there was pre-intellipark but ill run it anyways.
As for redundancy most of the other drives are dead/dying and until I get a proper replacement I don't really see the point I'm not storing critical data at this time mostly just downloading and Reseeding files via transmission.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Please run: zpool status from the console and upload the results in code tags (or just take a picture) .
The output of zpool status is already in the debug...

The message "The volume home (ZFS) state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected." refers to a disk issue, possibly the one with the bad sectors. The zpool status should reveal the problematic drive. Running SMART tests on your drives - smartctl -a /dev/adaX (where X) is a number, will give us more information about the health of your drives.

The output of smartctl -a /dev/ada(x) is already in the debug...

Have you guys figured out that the reason the forum rules ask for a debug is that it tells you literally 99.9% of everything you'll ever want to know about a system with a problem?
 
Joined
Oct 2, 2014
Messages
925
As already stated i wouldnt use a drive cache, even some users with 96Gb and 128Gb of RAM still run into issues with a cache and not enough RAM. The cache isnt doing what you think it is due to little RAM.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The output of zpool status is already in the debug...
Sure. And I get to download the debug file, unzip it, and dig through its contents to find the piece of information I'm looking for. And then delete it, or keep it littering my hard drive. Or I can ask the poster for the information I'm particularly interested in, and he can post it inline on the forum, where it's easier for everyone to see.

Debug files combine a lot of information in one file, which is great, but they make for a bit more work for the reader to make use of them. And of course, they're pretty much impossible to use if you're reading the forums on Tapatalk.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Ditto what @danb35, while I was composing a reply.

I had to run some errands this morning and asking for this information would probably get a faster response, than waiting for someone to download the .tgz file, extract it, and analyze the results.

Have you guys figured out that the reason the forum rules ask for a debug is that it tells you literally 99.9% of everything you'll ever want to know about a system with a problem?
 

ultimateon

Dabbler
Joined
Oct 22, 2014
Messages
14
I don't mind Reposting the info, but can we seriusly get back on track and discuss the information problems on the PM system?
In the meanwhile mind giving me input on the last post?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
No, al you've already been told, you should not use a cache drive. Not spinning rust, not an SSD, nothing. You just don't have enough RAM for it to benefit you. No, there's no need to run WDIDLE on the red drive--it has a reasonable load cycle count.

The information problems are mostly a result of the bad sectors on the disk. Since you have no redundancy, ZFS has no way to fix the problem. Do the long SMART tests, run the scrub, and then we can figure out how serious the problem is.

Have you seriously considered whether FreeNAS is a good match for your needs? Because right now, it isn't sounding like it is.
 

ultimateon

Dabbler
Joined
Oct 22, 2014
Messages
14
No, al you've already been told, you should not use a cache drive. Not spinning rust, not an SSD, nothing. You just don't have enough RAM for it to benefit you. No, there's no need to run WDIDLE on the red drive--it has a reasonable load cycle count.

The information problems are mostly a result of the bad sectors on the disk. Since you have no redundancy, ZFS has no way to fix the problem. Do the long SMART tests, run the scrub, and then we can figure out how serious the problem is.

Have you seriously considered whether FreeNAS is a good match for your needs? Because right now, it isn't sounding like it is.

Objectively speaking it was , it delivered a simple and effective management WebUI with all the plugins I need for the time being, where others have failed.
My current problem is not really getting the Funds I need to get the system up into decent running conditions ( Originally intended to get 16 gigs of EEC ram +5x4 TB Red's so I could have parity)
Considering I only have what small hardware I Posted I'm considering temporarily move to windows server and go with RAID 0, I have been running stripped volumes because I know the drives where going to eventually fail.
I was attempting to mitigate the current damage because it's near December when ill have sufficient money to get the extra ram and drives if not even a HP microserver gen 8 with a xeon and 32 gigs of ram .
 
Status
Not open for further replies.
Top