Checking new HDD's in RAID

anRossi · Feb 16, 2014

Yes that would work. And it would give you useful performance info on your raid.
But since you're using a single thread (-t 1), you probably don't want to use throughput mode (-T), which means you want -f instead of -F.
Also, I recommend you adjust the -s parameter to be close to the freespace on mypool so you get full coverage.

panz · Feb 16, 2014

When I'll finish my hardware build I'm going to study this in depth and do some experiments. Thanks! :)

scurrier · Mar 13, 2014

Good discussion here. I am taking copious notes. One thing I'm wondering is why no one mentioned using badblocks? Something like

Code:

badblocks -wv -b 4096 /dev/DEVICE

seems like it would be a good one to throw in the mix, perhaps in lieu of using dd. As presented, dd is only going to write a homogeneous pattern and read it back. But if the data that it reads back is not what was written, it's just going to continue on happily. Sure, you might force the disk to recognize that there's a problem and reallocate bad sectors or something, but why make due with writing homogeneous patterns when you could use badblocks? Badblocks' built-in test can write different patterns like 0x00, 0xAA, 0x55, and 0xFF and then read them back and make sure they read the same as they were written. Additionally, it will run all of these in a row, giving you a multi-day test of four complete reads and four complete writes. Seems like a better test to me.

Thoughts?

Yatti420 · Mar 13, 2014

badblocks is installed by default aswell.. Just make sure you don't run a destructive command.. Any dos level hard drive tool should work.. They don't care about the OS.. Only those bad sectors etc..

Code:

 badblocks
Usage: badblocks [-b block_size] [-i input_file] [-o output_file] [-svwnf]
      [-c blocks_at_once] [-d delay_factor_between_reads] [-e max_bad_blocks]
      [-p num_passes] [-t test_pattern [-t test_pattern [...]]]
      device [last_block [first_block]]

cyberjock · Mar 14, 2014

badblocks is a good way to test the drive. That's exactly what I use. But, most people don't want to do all these commands and wait about a day per disk. It's just about personal preference.

greeners · Mar 17, 2014

Great info posted here ... great timing for me as I am just at the point of beginning the testing of hard drives.

Thank you.

panz · Mar 18, 2014

To check new drives (let's assume a bunch of 6 drives) I always first put in a spare (old) disk, create a filesystem on it (called e.g. "storage"), create a dataset to record results on, then use a (WARNING: data destructive!) badblocks command like:

Code:

for i in 0 1 2 3 4 5; do
badblocks -svw -b 4096 -t 0xFF -t 0x00 -t 0xFF -o /mnt/storage/data/badblocks_da${i}.txt /dev/da${i} &
done

This will run for maybe for 72 hours on n. 6 3TB drives :) Thanks to @cyberjock for providing a good example of badblocks command.

greeners · Mar 28, 2014

panz said:
To check new drives (let's assume a bunch of 6 drives) I always first put in a spare (old) disk, create a filesystem on it (called e.g. "storage"), create a dataset to record results on, then use a (WARNING: data destructive!) badblocks command like:

Code:
for i in 0 1 2 3 4 5; do badblocks -svw -b 4096 -t 0xFF -t 0x00 -t 0xFF -o /mnt/storage/data/badblocks_da${i}.txt /dev/da${i} & done

This will run for maybe for 72 hours on n. 6 3TB drives :) Thanks to @cyberjock for providing a good example of badblocks command.

I ran through this test as part of my disk test regime. I adjusted for my disks as they are ada0 through to ada5. The disks showed activity for about 72 hours. The test appears to be finished now (no disk activity) but the output files are all 0 bytes. Unless no news is good news, I must have messed up running the test.

panz · Mar 28, 2014

So, now you have some .txt files: if they're empty you are a lucky owner of some good disks! :)

Yatti420 · Mar 28, 2014

I find badblocks really slow.. If you are doing a non-destructive scan I guess it can be ok if you are willing to leave the NAS for along time.. I prefer to use hdat2 still..

greeners · Mar 28, 2014

panz said:
So, now you have some .txt files: if they're empty you are a lucky owner of some good disks! :)

That is what I was hoping. Thanks!

trionic · Jun 1, 2014

This is one of the most useful threads on these forums and exactly what I was looking for.

One of my brand new drives (a WD Red 3TB) after an SMART long test reports a "Completed: read failure":

Code:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure      30%        4        3807195776
# 2  Conveyance offline  Completed without error      00%        0        -
# 3  Short offline      Completed without error      00%        0        -

Should I be relying on that drive?

Code:

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       1
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       11
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       3
194 Temperature_Celsius     0x0022   121   119   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       44

Ericloewe · Jun 1, 2014

trionic said:

This is one of the most useful threads on these forums and exactly what I was looking for.

One of my brand new drives (a WD Red 3TB) after an SMART long test reports a "Completed: read failure":

Code:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure      30%        4        3807195776
# 2  Conveyance offline  Completed without error      00%        0        -
# 3  Short offline      Completed without error      00%        0        -

Should I be relying on that drive?

Code:

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  100  253  051    Pre-fail  Always      -      0
  3 Spin_Up_Time            0x0027  100  253  021    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      1
  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      11
10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      1
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      0
193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      3
194 Temperature_Celsius    0x0022  121  119  000    Old_age  Always      -      29
196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  253  000    Old_age  Always      -      0
200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      44

You got a bad drive, RMA it. The last parameter should be 0.

trionic · Jun 1, 2014

Thank you for the advice. RMA created. Do WD ship new or refurbished replacement drives? One would hope they're new...

As it happens, WD's UK RMA handler is ten minutes walk from myplace of work

so I may trundle over there with the drive in my lunch hour ;)

I looked at all the other drives that I have for this ZFS build and none of them (apart from one I knew was knackered) showed a zero "Multi_Zone_Error_Rate".

Hugo Ochoa · Jun 1, 2014

Is smart not supported on expander backplanes? I'm getting an error saying it's not when I run it on my jbod hdds.

Sent from my HTC One_M8 using Tapatalk

Ericloewe · Jun 1, 2014

Hugo Ochoa said:
Is smart not supported on expander backplanes? I'm getting an error saying it's not when I run it on my jbod hdds.

Sent from my HTC One_M8 using Tapatalk

What's the hardware, specifically? SAS expanders are completely transparent to OS-level stuff. Are you perhaps using one of those dubious external enclosures that use eSATA and a SATA port multiplier? Not that they should interfere with S.M.A.R.T., since they're theoretically just as transparent.

Hugo Ochoa · Jun 1, 2014

The hardware is listed in my signature. I can run smart commands on the drives attached to one of the m1115s thorough breakout cables but the drives on the jbod give the error message.

Sent from my HTC One_M8 using Tapatalk

Ericloewe · Jun 1, 2014

Hugo Ochoa said:
The hardware is listed in my signature. I can run smart commands on the drives attached to one of the m1115s thorough breakout cables but the drives on the jbod give the error message.

Sent from my HTC One_M8 using Tapatalk

Are they otherwise functional? Everything should be working given your hardware.

Hugo Ochoa · Jun 1, 2014

Well, that's what I'm trying to find out. :) I'm in testing stage of this build and want to test the hdds in the jbod. They show fine in the gui as available drives. I guess I'm going to test dd commands on them to see if they perform writes and reads fine. I'm wondering if smart monitoring will work once I build the zpools

Sent from my HTC One_M8 using Tapatalk

Hugo Ochoa · Jun 1, 2014

Looks like only the conveyance test is not supported on my jbod. Just finished running the short test on all the drives and those worked fine.

Important Announcement for the TrueNAS Community.

Checking new HDD's in RAID

Dabbler

Guru

Patron

Wizard

Inactive Account

Cadet

Guru

Cadet

Guru

Wizard

Cadet

Explorer

Server Wrangler

Explorer

Dabbler

Server Wrangler

Dabbler

Server Wrangler

Dabbler

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Checking new HDD's in RAID"

Similar threads