Checking new HDD's in RAID

Status
Not open for further replies.

anRossi

Dabbler
Joined
Feb 1, 2014
Messages
36
Yes that would work. And it would give you useful performance info on your raid.
But since you're using a single thread (-t 1), you probably don't want to use throughput mode (-T), which means you want -f instead of -F.
Also, I recommend you adjust the -s parameter to be close to the freespace on mypool so you get full coverage.
 

panz

Guru
Joined
May 24, 2013
Messages
556
When I'll finish my hardware build I'm going to study this in depth and do some experiments. Thanks! :)
 

scurrier

Patron
Joined
Jan 2, 2014
Messages
297
Good discussion here. I am taking copious notes. One thing I'm wondering is why no one mentioned using badblocks? Something like
Code:
badblocks -wv -b 4096 /dev/DEVICE
seems like it would be a good one to throw in the mix, perhaps in lieu of using dd. As presented, dd is only going to write a homogeneous pattern and read it back. But if the data that it reads back is not what was written, it's just going to continue on happily. Sure, you might force the disk to recognize that there's a problem and reallocate bad sectors or something, but why make due with writing homogeneous patterns when you could use badblocks? Badblocks' built-in test can write different patterns like 0x00, 0xAA, 0x55, and 0xFF and then read them back and make sure they read the same as they were written. Additionally, it will run all of these in a row, giving you a multi-day test of four complete reads and four complete writes. Seems like a better test to me.

Thoughts?
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
badblocks is installed by default aswell.. Just make sure you don't run a destructive command.. Any dos level hard drive tool should work.. They don't care about the OS.. Only those bad sectors etc..

Code:
 badblocks
Usage: badblocks [-b block_size] [-i input_file] [-o output_file] [-svwnf]
      [-c blocks_at_once] [-d delay_factor_between_reads] [-e max_bad_blocks]
      [-p num_passes] [-t test_pattern [-t test_pattern [...]]]
      device [last_block [first_block]]
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
badblocks is a good way to test the drive. That's exactly what I use. But, most people don't want to do all these commands and wait about a day per disk. It's just about personal preference.
 

greeners

Cadet
Joined
Mar 15, 2014
Messages
6
Great info posted here ... great timing for me as I am just at the point of beginning the testing of hard drives.

Thank you.
 

panz

Guru
Joined
May 24, 2013
Messages
556
To check new drives (let's assume a bunch of 6 drives) I always first put in a spare (old) disk, create a filesystem on it (called e.g. "storage"), create a dataset to record results on, then use a (WARNING: data destructive!) badblocks command like:

Code:
for i in 0 1 2 3 4 5; do
badblocks -svw -b 4096 -t 0xFF -t 0x00 -t 0xFF -o /mnt/storage/data/badblocks_da${i}.txt /dev/da${i} &
done


This will run for maybe for 72 hours on n. 6 3TB drives :) Thanks to @cyberjock for providing a good example of badblocks command.
 

greeners

Cadet
Joined
Mar 15, 2014
Messages
6
To check new drives (let's assume a bunch of 6 drives) I always first put in a spare (old) disk, create a filesystem on it (called e.g. "storage"), create a dataset to record results on, then use a (WARNING: data destructive!) badblocks command like:

Code:
for i in 0 1 2 3 4 5; do
badblocks -svw -b 4096 -t 0xFF -t 0x00 -t 0xFF -o /mnt/storage/data/badblocks_da${i}.txt /dev/da${i} &
done


This will run for maybe for 72 hours on n. 6 3TB drives :) Thanks to @cyberjock for providing a good example of badblocks command.

I ran through this test as part of my disk test regime. I adjusted for my disks as they are ada0 through to ada5. The disks showed activity for about 72 hours. The test appears to be finished now (no disk activity) but the output files are all 0 bytes. Unless no news is good news, I must have messed up running the test.
 

panz

Guru
Joined
May 24, 2013
Messages
556
So, now you have some .txt files: if they're empty you are a lucky owner of some good disks! :)
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
I find badblocks really slow.. If you are doing a non-destructive scan I guess it can be ok if you are willing to leave the NAS for along time.. I prefer to use hdat2 still..
 

greeners

Cadet
Joined
Mar 15, 2014
Messages
6
So, now you have some .txt files: if they're empty you are a lucky owner of some good disks! :)

That is what I was hoping. Thanks!
 

trionic

Explorer
Joined
May 1, 2014
Messages
98
This is one of the most useful threads on these forums and exactly what I was looking for.

One of my brand new drives (a WD Red 3TB) after an SMART long test reports a "Completed: read failure":
Code:
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure      30%        4        3807195776
# 2  Conveyance offline  Completed without error      00%        0        -
# 3  Short offline      Completed without error      00%        0        -

Should I be relying on that drive?
Code:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       1
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       11
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       3
194 Temperature_Celsius     0x0022   121   119   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       44
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
This is one of the most useful threads on these forums and exactly what I was looking for.

One of my brand new drives (a WD Red 3TB) after an SMART long test reports a "Completed: read failure":
Code:
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure      30%        4        3807195776
# 2  Conveyance offline  Completed without error      00%        0        -
# 3  Short offline      Completed without error      00%        0        -

Should I be relying on that drive?
Code:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  100  253  051    Pre-fail  Always      -      0
  3 Spin_Up_Time            0x0027  100  253  021    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      1
  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      11
10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      1
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      0
193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      3
194 Temperature_Celsius    0x0022  121  119  000    Old_age  Always      -      29
196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  253  000    Old_age  Always      -      0
200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      44

You got a bad drive, RMA it. The last parameter should be 0.
 

trionic

Explorer
Joined
May 1, 2014
Messages
98
Thank you for the advice. RMA created. Do WD ship new or refurbished replacement drives? One would hope they're new...

As it happens, WD's UK RMA handler is ten minutes walk from myplace of work :eek: so I may trundle over there with the drive in my lunch hour ;)

I looked at all the other drives that I have for this ZFS build and none of them (apart from one I knew was knackered) showed a zero "Multi_Zone_Error_Rate".
 

Hugo Ochoa

Dabbler
Joined
Mar 20, 2014
Messages
47
Is smart not supported on expander backplanes? I'm getting an error saying it's not when I run it on my jbod hdds.

Sent from my HTC One_M8 using Tapatalk
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Is smart not supported on expander backplanes? I'm getting an error saying it's not when I run it on my jbod hdds.

Sent from my HTC One_M8 using Tapatalk

What's the hardware, specifically? SAS expanders are completely transparent to OS-level stuff. Are you perhaps using one of those dubious external enclosures that use eSATA and a SATA port multiplier? Not that they should interfere with S.M.A.R.T., since they're theoretically just as transparent.
 

Hugo Ochoa

Dabbler
Joined
Mar 20, 2014
Messages
47
The hardware is listed in my signature. I can run smart commands on the drives attached to one of the m1115s thorough breakout cables but the drives on the jbod give the error message.


Sent from my HTC One_M8 using Tapatalk
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The hardware is listed in my signature. I can run smart commands on the drives attached to one of the m1115s thorough breakout cables but the drives on the jbod give the error message.


Sent from my HTC One_M8 using Tapatalk

Are they otherwise functional? Everything should be working given your hardware.
 

Hugo Ochoa

Dabbler
Joined
Mar 20, 2014
Messages
47
Well, that's what I'm trying to find out. :) I'm in testing stage of this build and want to test the hdds in the jbod. They show fine in the gui as available drives. I guess I'm going to test dd commands on them to see if they perform writes and reads fine. I'm wondering if smart monitoring will work once I build the zpools

Sent from my HTC One_M8 using Tapatalk
 

Hugo Ochoa

Dabbler
Joined
Mar 20, 2014
Messages
47
Looks like only the conveyance test is not supported on my jbod. Just finished running the short test on all the drives and those worked fine.
 
Status
Not open for further replies.
Top