SAS disks

rvassar

Guru
Joined
May 2, 2018
Messages
972
Observation after a short time of runtime: the disks are HOT as hell! They aren't really doing anything (besides long S.M.A.R.T. tests running), they sit outside of the case, and I can barely keep my hand on them. That sounds bad, right? I am not sure what the optimal temperature for a disk is, and there seem - as far as I remember - to be conflicting opinions about this, but I'll definitely have to add a fan blowing on the disk cage when I'm done testing and will reassemble the server.

Those are 7200 RPM enterprise SAS drives. They're designed to live in front of 2U rack fans that can cut off the tip of your finger.
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
I know. I didn't expect them to get that hot though.
I'll just add an intake fan in front of the cage (pretty close so good for cooling) and keep the outtake one.
I'll keep researching about temperatures, but if I can keep them around 50°C I guess that will suffice.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
I know. I didn't expect them to get that hot though.
I'll just add an intake fan in front of the cage (pretty close so good for cooling) and keep the outtake one.
I'll keep researching about temperatures, but if I can keep them around 50°C I guess that will suffice.

It really doesn't take much. The 2U fans are more for the CPU's and 2.5" U.2 NVMe devices, the 3.5" drives are easy to keep cool. I run a couple HGST HDN726040ALE614's in a gamer case with a 120mm fan blowing across them. They stay around 38 - 40/C, or perhaps 42/C in July (I'm in Texas, room ambient is 26/C in July).
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
barely keep my hand on them. That sounds bad, right? I

No, that sounds totally normal, truth be told. These kind of drives all expect to be inside a chassis with forced air flowing around them. That's why the server chassis are typically very noisy, as has been discussed in the past while warning people trying to make "quiet" servers against trying to re-fan them with hobbyist/gamer grade fans not to do that.
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
I woke up, the tests were finished, so the "final" temperatures when the disk cage is sitting outside are 59, 55, 60, 56°C. I'll start the badblocks test shortly and just put a FAN next to this contraption. I hope the temperatures will drop noticeably. I bet with intensive surface tests the temperatures would jump way above the specced range.
edit:
After a while, temps felt down to 36 38 38 42. Hah.
Of course, the FAN is going at 1200rpm, which is too much, but I hope even under 1000 will result in similar temps.
1642059913352.png


Very safe setup with four cats about :D
 
Last edited:

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
So I started badblocks, but it spams the living hell out of the terminal window with block numbers or something. Is there any way to hide that? It shows progress summary from time to time, but it disappears before I can even notice it.
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
...as in, is this output normal?
1642063597950.png
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
After specifying output file, the blocks are being spammed into that and I see normal progress summary, however...
There's something fishy going on: when I log into the GUI from a browser, it's borderline stuck even with one instance of badblocks running. Two basically kill the tab and the entire browser for some weird reason.
I allocated two more cores (usually run TrueNAS with two) in ESXi, but it didn't help. Running just one instance makes the dashboard report 40%

What's more, the console is spammed with this:
1642065908196.png


What's that error, and should I be concerned?
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Badblocks is best run inside a tmux session and/or from ssh, not from the GUI shell because it takes days to complete. With tmux, you can always disconnect, reconnect and reattach the ongoing session to see how it's going.
What you're seeing is not normal, but I don't know if it is yet another SAS issue or plainly a failing drive.
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
I didn't say I ran it from the GUI.
I use Mobaxterm.

Also my browser (Pale Moon) seems to be the cause of the lockups. It works just fine in Firefox. I suspect the green console at the bottom is the problem, somehow. The GUI is somewhat ok-ish with just one instance of SSH/badblocks running.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Tmux is still recommended, so that the session can be safely disconnected without interrupting the process (intentionally or not).

In any case, you can have a look at the SMART data to see if anything sticks out as bad.
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
I ran short and long SMART tests before badblocks, no errors whatsoever.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Yeah, but that's not the same as actually trying a workload on the disks. You don't need to run the SMART tests in the middle of badblocks, just read out the SMART data.
 

petersmith34

Cadet
Joined
Jan 13, 2022
Messages
1
I woke up, the tests were finished, so the "final" temperatures when the disk cage is sitting outside are 59, 55, 60, 56°C. I'll start the badblocks test shortly and just put a FAN next to this contraption. I hope the temperatures will drop noticeably. I bet with intensive surface tests the temperatures would jump way above the specced range.
edit:
After a while, temps felt down to 36 38 38 42. Hah.
Of course, the FAN is going at 1200rpm, which is too much, but I hope even under 1000 will result in similar temps.
View attachment 52257

Very safe setup with four cats about :D
are these 4 the sata hard disk?

picture for reference
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
Did you not read my post?
I ran both SMART tests before running badblocks. When that reported no errors, I proceeded to badblocks (and if that's not workload then I don't know what is).
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I think you misunderstood what I said. I suggested that you read the SMART data now, as you're being spammed with SCSI errors. This is orthogonal to the SMART testing.
What you did is perfectly valid and reasonable, but is not the full picture. Now that the disks are doing something (as far as the host system is concerned, anyway), it's a good time to see what they're saying in their error logs and other SMART parameters, since you're getting errors.

Good SMART data tends to point to a controller/cable/expander/backplane issue, bad SMART data points to a bad disk.
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
Aha! I read you.
1642078591898.png


Looks ok, but I haven't figured out how to get all the specific SMART values.

edit:
It seems SMART values are only reported for SATA disks. Weird.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, SAS disks are less uniform about this than SATA disks. Is that the output of smartctl -x?
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
No, I did the usual smartctl -a /dev/xxx
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Try -x instead, which gives more info.
 
Top