How to decode disk serial number

john60 · Jan 9, 2022

I unplugged 2 disks and got this error message.

But these serial numbers do not match the serial numbers when all disks are attached.

Here is the image after I re-plugged in the 2 disks.

Also, an alert appeared after I inserted saying something did not recover.

Also my pools show an alert.

Question 1: How do I convert the serial number in the disk failure alert to the serial numbers displayed in the disk window?
Question 2: How do I clear the red x in the pool window or do I still have a problem? Are the checksum because I disconnected the disk or does it mean something really got messed up when I disconnected my disk for the experiment?

john60 · Jan 9, 2022

In another post, there was response by joeschmuck and his signature line had hard-drive-troubleshooting-guide-all-versions-of-freenas.
So I am running

smartctl –t long /dev/da1

and

smartctl –t long /dev/da2

I am confused as whether or not there is even an issue. A red x in the Pools tab, and pool status show 2 checksums of 2 and 1 are the only anomalous indication of something not right. I assume that if I'm 1 disk failure away from total data loss there would be more "in your face" alarms.

I would be appreciative of some insight from someone with experiance.

Redcoat · Jan 9, 2022

glabel status

will get you a list of drive identifiers and their GPTID's. From this you can match the affected diskd to their serial numbers and find them in your system.

john60 · Jan 10, 2022

The web GUI reports both smart test as success.

Pool still shows Unhealthy

At this point, not sure if the Unhealthy x is real or not. I would appreciate someone with experience commenting.

Redcoat · Jan 10, 2022

In the other thread in which you posted I suggested that you post here in code tags the full output of the long smart tests that you made.

Oh - I see I didn't - meant to but didn't. So, please post the full output here in code tags.

john60 · Jan 10, 2022

Tried a reboot and the unhealthy icon disappeared.

So my remaining questions are:
Q1: Is this how it is supposed to work or am I missing something.
Q2: How do I map the serial number in the original alert to the serial number of on the disk/web GUI?

john60 · Jan 10, 2022

Redcoat said:
will get you a list of drive identifiers and their GPTID's. From this you can match the affected diskd to their serial numbers and find them in your system.

Wow, you are a trove of information. Thanks.

These # do not match with the first alert dialog.
I tried converting 3dd917d2 from hex to decimal but that does not match either.

Redcoat · Jan 10, 2022

john60 said:
These # do not match with the first alert dialog.

Yes, I noticed that after I posted - I have no idea what those numbers are that appeared in that first error message.

BTW - are you using TrueNAS Core or Scale - don't think you said and I didn't ask. I know nothing about Scale...

You didn't give us your system info yet - do you have a hot-plug drive arrangement? When you unplugged your drives (as your first post says) were they running or was the server powered down?

sretalla · Jan 10, 2022

Redcoat said:
I have no idea what those numbers are that appeared in that first error message.

Those are the ZFS partition guids... I think you can look at them with zdb -l /dev/da1p2, but I would not recommend doing that if you didn't already know about it, it's unlikely to be helpful as it's really only displayed when the disk is offline anyway.

If you run that on a pool member disk (data partition), you see that it has information about the other members of its VDEV and a little about the pool.

It's not really helpful to humans though.

Redcoat · Jan 10, 2022

sretalla said:
Those are the ZFS partition guids...

Thanks for the info - "learning never ends"!

I wonder why the Error Message displays that info instead of the disk guid.

joeschmuck · Jan 10, 2022

john60 said:
I unplugged 2 disks and got this error message.

First, I hope this was accidental. If not, then stop doing it.

Second, I feel that you should be running a smart long test on every one of your drives, not just da1 & da2. Then inspect the results for Any errors. Read that troubleshooting guide again, I think it's pretty clear on what to do. I doubt you have a drive failure related to it being unplugged but you could have a drive failure for other reasons.

Lastly I would run a SCRUB on your pools (you can do this first if you like). I suspect they will repair themselves if you only removed 2 drives temporarily since you have a RAIDZ2. If you run a Scrub and if you find nothing to repair then you can clear the error message if you are still receiving that message. I would shutdown the system and wait a few minutes, then power it back on, run another scrub and if all is good, then you are done.

That is what I'd do. You know why your pool went crazy, you removed two drives with the power on which is not good and the system yelled at you for it. If it would have been 3 drives, well odds are your data would be gone.

jgreco · Jan 10, 2022

Moderator note: moved from "Forum Suggestions" to "Operation"

john60 · Jan 10, 2022

Redcoat said:
BTW - are you using TrueNAS Core or Scale - don't think you said and I didn't ask. I know nothing about Scale...

I have both, but this experiment was on core.

Redcoat said:
When you unplugged your drives (as your first post says) were they running or was the server powered down?

My goal was to prepare for what will happen when a real failure occurs in the future. Specifically, If lose 2 drives, how do I figure out what drive failed so I can replace the right one.

I powered down the system, pull the cable from 2 SATA drives. Then powered up the system.
I was hoping to (1) see the serial numbers of the drive I unplugged, (2) that Z2 meant the system still worked.

john60 · Jan 10, 2022

joeschmuck said:
First, I hope this was accidental. If not, then stop doing it.

Second, I feel that you should be running a smart long test on every one of your drives, not just da1 & da2. Then inspect the results for Any errors. Read that troubleshooting guide again, I think it's pretty clear on what to do. I doubt you have a drive failure related to it being unplugged but you could have a drive failure for other reasons.

Lastly I would run a SCRUB on your pools (you can do this first if you like). I suspect they will repair themselves if you only removed 2 drives temporarily since you have a RAIDZ2. If you run a Scrub and if you find nothing to repair then you can clear the error message if you are still receiving that message. I would shutdown the system and wait a few minutes, then power it back on, run another scrub and if all is good, then you are done.

That is what I'd do. You know why your pool went crazy, you removed two drives with the power on which is not good and the system yelled at you for it. If it would have been 3 drives, well odds are your data would be gone.

I would like to do a fire drill so that I am ready for when there is a real failure.
A real failure will be stressful and learning while stress does not work well for me.

I assume it is safe if I power down the system, unplug both power and sata cable from 2 drives, then power back up?

I repeated the experiment, but only 1 drive powered down, and a different drive.
It now displays the serial number in the alert, not like yesterday.

only 1 disk down, shows serial number.png

Now with 2 drives down, again different drives from yesterday. I get serial numbers, not like yesterday.

2 drives down, but different from yesterday.png

Now power down, re-inserted the 2 drives, powered back up, and running scrub to see if this will wipe the unhealthy flags from the GUI.
This will take some time, so until later.

Tomorrow I will try the same 2 drives as yesterday.

john60 · Jan 10, 2022

The GUI blanked out, so I re-logged, but the pool window does not show scrub still running.
Maybe it was completed way faster than predicted, so I started another and got this message.

So obviously scrub is still running.
Was there an indication somewhere in the GUI I just missed or is the user expected to try to run it again to see if it is still running?
Where will the scrub results appear?

Jessep · Jan 10, 2022

Shell
Run command "zpool status"

jgreco · Jan 11, 2022

john60 said:
Where will the scrub results appear?

Well this is actually a very good question. I'm reasonably positive the GUI used to indicate this somewhere. I just checked one of our filers known for "the long scrubs" and while it is clearly doing a scrub as indicated from the CLI, I am not seeing an indication of this in the GUI. Perhaps I am just not yet fully awake this morning...?

sretalla · Jan 11, 2022

jgreco said:
I just checked one of our filers known for "the long scrubs" and while it is clearly doing a scrub as indicated from the CLI, I am not seeing an indication of this in the GUI. Perhaps I am just not yet fully awake this morning...?

It's there if it was the last run activity on that pool...

jgreco · Jan 11, 2022

Yeesh, that's buried. But you're right, it's there if you dig.

john60 · Jan 11, 2022

sretalla said:
It's there if it was the last run activity on that pool...

yes, I found it now. For me it reports the scrub as zero errors.

But it is interesting that a critical alarms remains even though the disk were plugged back in

and still the red x

I logged out of the GUI and logged back in but these alarms remain.

Yesterday a restart of truenas core cleared the red x. So I did a restart again today.

and now the red is gone just like yesterday

So it would appear that if a power down, remove disk, power up see alarms, power down, put back disk, you will get a red x, scrub, x remains, and another reboot is required to clear the red x.

Is this 2 bugs or does Truenas really require a 2nd reboot to clear the 'red (unhealthy) x' in the Pool window and sometime reports funky serial numbers?
Recall yesterday
- a "smartctl –t long /dev/da1 " failed to clear the 'red (unhealthy) x'.
- a funky serial number was reported for the removed disk.

I need to try unplugging the same 2 disks as yesterday to see if the funky serial numbers appear again.

Important Announcement for the TrueNAS Community.

How to decode disk serial number

Explorer

Explorer

MVP

Explorer

MVP

Explorer

Explorer

MVP

Powered by Neutrality

MVP

Old Man

Resident Grinch

Explorer

Explorer

Explorer

Patron

Resident Grinch

Powered by Neutrality

Resident Grinch

Explorer

Similar threads