Checksum errors on all drives in the pool

cmh

Explorer
Joined
Jan 7, 2013
Messages
75
Was on vacation, back now. Starting to work through the drives and just wanted to confirm this command line:

Code:
sudo badblocks -vsw -b 4096 "$drive"


Have a little extra stuff around that in the script but that's the key part. Have only started the first drive so if this isn't the best option, then lemme know.

Also, the test system is an old Intel Atom-based system with an eSATA port which I connected to an external eSATA drive bay. Also have a USB drive bay but since ZFS recommends against USB in pools, I figured the eSATA was better.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Not sure how well the eSATA connection will work, it might be just fine. Are you using Ubuntu or similar? The command looks correct, you may or may not need 'sudo'. And of course "$drive" is really /dev/sda depending on the drive you are testing.

Badblocks runs through four different test patterns before it stops. It will take a long time but wait it out.
 

cmh

Explorer
Joined
Jan 7, 2013
Messages
75
This is the key line from a bigger script that I run as my login user - not root, so the sudo is necessary. Running on debian - had trouble installing Rocky which looks to have been related to hardware issues on the first test system.

Drive device file can be passed from the command line, thus the "$drive" variable. Couple other things going on that I omitted - smartctl is used to gather drive model, serial number and size, and that's used in the output file from badblocks. (that was added afterwards) Had to add the -b 4096 because the default block size wasn't enough and overflowed the badblocks capabilities.

So far the eSATA is working - it's been almost a day and it's getting close to done on the first pass:

Code:
Checking for bad blocks in read-write mode
From block 0 to 2441609215
Testing with pattern 0xaa:  87.54% done, 22:31:07 elapsed. (0/0/0 errors)


Got error checking, once badblocks is done, it will send me a notification so I can get the next drive started ASAP afterwards.

Thanks for the feedback!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
0/0/0 errors is nice to see. I hope it stays that way for all your testing.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Ideally you would be running the tests on the drives in parallel - all drives at once. You should also do this under tmux or screen so closing the window doesn't kill the process
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
He has a eSATA enclosure, I made an assumption it only handled a single drive. Bad me for an assumption.
 

cmh

Explorer
Joined
Jan 7, 2013
Messages
75
Ideally you would be running the tests on the drives in parallel - all drives at once. You should also do this under tmux or screen so closing the window doesn't kill the process
Guess I should have mentioned I'm a linux sysadmin so thus the wrapper script, and am already running it in screen.

@joeschmuck is correct that the eSATA bay is a single drive, but I do have a USB 3.0 single bay as well that I could probably use. Since I've got five drives total, I'm waiting until this drive is done before plugging in the USB 3 just to avoid the admittedly outside chance that it could cause confusion with the running badblocks. Once that's done I can run two at a time, which will be just fine because shipment on the TrueNAS Mini box itself is predicted for the end of September, so I'm not in much of a rush. Once this first one is done, I'll be able to run two at a time and still be done weeks before the hardware shows up. Key is this way if I find a drive with a bunch of errors I'll still be within the return window.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

cmh

Explorer
Joined
Jan 7, 2013
Messages
75
You don't want to use USB as a long term solution.
Please read the USB of Doom.
Well aware of the disks of USB with ZFS, which is why I used the eSATA bay instead of the USB 3.0 one. I don't think the duration of a badblocks test (a couple days) counts as "long term", does it? I'm talking about using it for running the badblocks write test on the new drives so I can confirm they're all good before the TrueNAS Mini enclosure shows up, as it's back ordered.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Actually badblocks is not ZFS so you might be just fine.
 
  • Like
Reactions: cmh

cmh

Explorer
Joined
Jan 7, 2013
Messages
75
Actually badblocks is not ZFS so you might be just fine.
Might all be moot - Mini chassis was supposed to arrive at the end of September but has apparently already shipped, so I might just be able to run the final four disks in parallel in the chassis before installing TN. Or I could just install TN and run badblocks on that before setting up the pool, which is probably even better!

Current status - on the second pass of the first drive and still 0/0/0 across the board. W00t!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Or I could just install TN and run badblocks on that before setting up the pool, which is probably even better!
My personal preference is to boot up an Ubuntu Live DVD and run it from there.
 

cmh

Explorer
Joined
Jan 7, 2013
Messages
75
TrueNAS Mini showed up today. Seems like it comes with the OS installed internally, so I used the two SSDs for a small mirrored pool and installed the four drives that still need to be tested. Booted up into the installed OS, got it set up, copied over my "drivetest" script, fired up four separate `screen` sessions (named for each drive) and launched the script. Now all four drives are running, so hopefully by next week I'll be ready to set the thing up.
 

cmh

Explorer
Joined
Jan 7, 2013
Messages
75
Got the Mini running badblocks on the remaining four drives. Noticed ada0 and ada2 were running at higher throughput in the monitoring than ada1 and 3. Couple days later, ada0 and 2 finished about half a day before the other two, so apparently the speed difference stayed consistent throughout. Interesting, curious why two were faster and two were slower. The speed was consistent between the fast two as well as between the slow two, their finish times lined up.

Moved over the fifth drive which completed (in far longer) - all five drives with 0 errors, that's super nice.

On the downside, the pool I've made using the two SSDs has now crashed twice, causing the system to need to be rebooted, so that's kinda suboptimal. Opened a case with iX to see what's up with that before I start moving data over.

Thanks again for all the help!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I don't know why the drives would take longer, I have my theory but that is not fact. Glad to see it is completed, hopefully without errors.
On the downside, the pool I've made using the two SSDs has now crashed twice, causing the system to need to be rebooted, so that's kinda suboptimal. Opened a case with iX to see what's up with that before I start moving data over.
I would recommend you start a new thread if you need to troubleshoot this issue, or contact iXsystems.
 

cmh

Explorer
Joined
Jan 7, 2013
Messages
75
I would recommend you start a new thread if you need to troubleshoot this issue, or contact iXsystems.
From my message: "Opened a case with iX..." :wink: Figure since the system is brand new and purchased from them, seemed like the best course of action.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
From my message: "Opened a case with iX..." :wink: Figure since the system is brand new and purchased from them, seemed like the best course of action.
That is why I recommended that option. You paid for a nice product and support service, you should use it. Paid users get priority. I hope they get to the bottom of it fast.
 
Top