System locking up

Status
Not open for further replies.

legacyofbob

Cadet
Joined
Nov 26, 2013
Messages
6
Hello,

I have been running an older version of FreeNAS (FreeNAS-9.1.1-RELEASE-x64 (a752d35)) for a few years now and it's been incredibly stable and reliable. I've been having some problems in the past month or so and I plan to upgrade once I get a few things in order. The problem is that it will periodically (about once per week) lock-up. The UI is unresponsive and I can no longer access the storage from the clients. A press of the reset button and I'm back up and running in a minute or two. My volume SMART tests all show good so I'm not worried that my storage pool is going bad. I did realize, however, that I don't have any tests running on the OS disk, which is a small, 60GB SSD. So my questions are as follows:

  1. Is there a way to run health checks on the OS disk?
  2. Is there a log file or otherwise that might tell me why the system locked up?
Code:
System Information

Hostname    freenas.local
Build    FreeNAS-9.1.1-RELEASE-x64 (a752d35)
Platform    Intel(R) Core(TM) i5-3350P CPU @ 3.10GHz
Memory    8107MB
System Time    Tue Apr 19 10:32:34 PDT 2016
Uptime    10:32AM up 1 day, 18:47, 0 users
Load Average    0.00, 0.00, 0.00
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
If your SSD supports SMART you can
Code:
[root@freenas] ~# smartctl -a /dev/ada0
smartctl 6.4 2015-06-04 r4109 [FreeBSD 10.3-RELEASE amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Intel X25-E SSDs
Device Model:  SSDSA2SH032G1GN INTEL
Serial Number:  xxxxxxxxxxxxxxxxxx
LU WWN Device Id: 5 001517 95948fa3d
Firmware Version: 045C8862
User Capacity:  32,000,000,000 bytes [32.0 GB]
Sector Size:  512 bytes logical/physical
Rotation Rate:  Solid State Device
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ATA/ATAPI-7 T13/1532D revision 1
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:  Tue Apr 19 12:45:14 2016 CDT
>>>>>>SMART support is: Available - device has SMART capability.
>>>>>>SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

I don't practice what I preach though (5000 hrs. since tested last);)
Code:
[root@freenas] ~# smartctl -a /dev/ada0
smartctl 6.4 2015-06-04 r4109 [FreeBSD 10.3-RELEASE amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Intel X25-E SSDs
Device Model:  SSDSA2SH032G1GN INTEL
Serial Number:  CVEM048400ET032HGN
LU WWN Device Id: 5 001517 95948fa3d
Firmware Version: 045C8862
User Capacity:  32,000,000,000 bytes [32.0 GB]
Sector Size:  512 bytes logical/physical
Rotation Rate:  Solid State Device
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ATA/ATAPI-7 T13/1532D revision 1
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:  Tue Apr 19 12:45:14 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
  was never started.
  Auto Offline Data Collection: Disabled.
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever
  been run.
Total time to complete Offline
data collection:  (  1) seconds.
Offline data collection
capabilities:  (0x75) SMART execute Offline immediate.
  No Auto Offline data collection support.
  Abort Offline collection upon new
  command.
  No Offline surface scan supported.
  Self-test supported.
  Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  2) minutes.
Extended self-test routine
recommended polling time:  (  2) minutes.
Conveyance self-test routine
recommended polling time:  (  1) minutes.

SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time  0x0000  100  000  000  Old_age  Offline  -  0
  4 Start_Stop_Count  0x0000  100  000  000  Old_age  Offline  -  0
  5 Reallocated_Sector_Ct  0x0002  100  100  000  Old_age  Always  -  0
  9 Power_On_Hours  0x0002  100  100  000  Old_age  Always  -  5472
12 Power_Cycle_Count  0x0002  100  100  000  Old_age  Always  -  281
192 Unsafe_Shutdown_Count  0x0002  100  100  000  Old_age  Always  -  63
232 Available_Reservd_Space 0x0003  100  100  010  Pre-fail  Always  -  0
233 Media_Wearout_Indicator 0x0002  099  099  000  Old_age  Always  -  0
225 Host_Writes_32MiB  0x0000  200  200  000  Old_age  Offline  -  1514
226 Intel_Internal  0x0002  255  000  000  Old_age  Always  -  4294967295
227 Intel_Internal  0x0002  000  000  000  Old_age  Always  -  281474976710655
228 Intel_Internal  0x0002  000  000  000  Old_age  Always  -  4294967295

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline  Completed without error  00%  5472  -
# 2  Extended offline  Completed without error  00%  34  -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@freenas] ~#
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
  • Is there a log file or otherwise that might tell me why the system locked up?
@legacyofbob
I see you are running FreeNAS on desktop hardware and non-ECC memory.
Do your upgrade plans include just software or hardware upgrades? Or both?
After a long time of uneventful FreeNAS operation, I would first check my
hardware (this is what you seem to be starting with).
If the small SSD checks out, I would MemTest the heck outta the RAM next.

While waiting for our more Senior members to respond, let me help you,
provide more information to help those who might respond.

Please post results of:

[root@freenas] ~# camcontrol devlist

System -> Advanced -> Save Debug (save the file to client and upload it in a post)
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Please tell us more about your usage and environment.

For example, are you using jails? Plex? ... Is the server exposed to the Internet?


Sent from my iPhone using Tapatalk
 

legacyofbob

Cadet
Joined
Nov 26, 2013
Messages
6
Thank you to all who have responded.

Yes I am using some leftover desktop-type hardware for this machine with non-ECC memory. I had thought about upgrading the storage from 2x4TB in RAID1 to 4x4TB or 6x4TB in RAIDZ2 at some point but I was not planning on upgrading any of the other hardware.

I use this just in my home, for a few uses but mainly storing photos and videos. I do not use any plugins or jails. I do use this for PLEX but I run the plex server on my main desktop computer and I have a network drive mapped to the FREENAS machine. I know that sounds goofy but I remember trying to set up the PLEX plugin when I started out and I couldn't figure it out so I gave up. I may try again if I upgrade the FreeNAS version. I have a couple of apps on my families smartphones that automatically upload new pictures/videos taken to folders on the NAS as well. There is no internet access to the NAS.

Thank you for the instructions on how to perform the SMART test on the OS disk, here is that output but it looks like it's good to me:
Code:
[root@freenas] ~# smartctl -a /dev/ada0
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     MK0060EAVDR
Serial Number:    S0CGNEAZC02479
LU WWN Device Id: 5 002538 050018b76
Firmware Version: 3C32HPG6
User Capacity:    60,022,480,896 bytes [60.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Wed Apr 20 05:47:39 2016 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  328) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (   6) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002b   100   100   010    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       26312
12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       30
171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
173 Unknown_Attribute       0x0033   099   099   001    Pre-fail  Always       -       4
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   100   100   010    Pre-fail  Always       -       1208
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   080   080   000    Old_age   Always       -       26
195 Hardware_ECC_Recovered  0x0032   100   100   000    Old_age   Always       -       1208
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   253   253   000    Old_age   Always       -       0
201 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       100
202 Unknown_SSD_Attribute   0x0033   100   099   090    Pre-fail  Always       -       0
203 Run_Out_Cancel          0x0033   079   079   010    Pre-fail  Always       -       17

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     26312         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@freenas] ~#


Here is the other output that was requested:
Code:
[root@freenas] ~# camcontrol devlist
<MK0060EAVDR 3C32HPG6>             at scbus0 target 0 lun 0 (ada0,pass0)
<ST4000DM000-1F2168 CC52>          at scbus1 target 0 lun 0 (ada1,pass1)
<ST4000DM000-1F2168 CC52>          at scbus2 target 0 lun 0 (ada2,pass2)
[root@freenas] ~# 


I uploaded the debug output to this post. I will memtest the ram now since the HDD seems to be okay. Thanks again for the opinions.
 

Attachments

  • debug-freenas-20160420060331.txt
    328.6 KB · Views: 347
Status
Not open for further replies.
Top