Degraded and Unavailable Drives Suddenly

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Was moving my server to a rack, had taken all the drives out before moving, was cautious with it, and when booting online everything was fine for a few days.
Then suddenly I got an email while at work, that 2 drives were unavailable and 1 was degraded. I went on and looked and didn't know what to do.
Also it's weird that the two unavailable drives are both spares? But also I see at the top of the picture it shows da14p2 which is listed under "SPARE" which is listed under "MIRROR". So I'm not sure if I'm reading it wrong. I'm very confused.

I don't think it's the drives.
3 drives at once? Can't be. They were bought at different times, and bought used (From a seller which I've had zero issues from all these other like 18 drives from).
Unless it's saying that one drive failed (degraded) and the other 2 are unavailable because it resilvered to them? But why are 2 unavailable then? Just very confused.

So a few hours later I was heading home, I could still access the server.
Then suddenly 10min later between checking it, and getting home. I walk into my room, and the server is off. I pulled it out and the HBA pulled out a little because I didn't have it screwed down and the back of the server slides out from the front because it has rear backplane. So the wires pulled a bit.
I put it all back together, and reseated wires and stuff and it works fine and booted.
I reran a scrub, and the same exact drive shows degraded, and 2 unavailable.

I am not very familiar with diag and not sure where to start, what might be the issue.
I had ran smart tests on the 3 drives and I think they all seemed to pass?

It's a TrueNAS Core, and a Supermicro server with a LSI HBA in IT Mode for reference.

Code:
# cat /var/log/messagesSep 21 00:00:00 hinata newsyslog[7010]: logfile turned over due to size>200KSep 21 00:00:00 hinata syslog-ng[1663]: Configuration reload request received, r
eloading configuration;Sep 21 00:00:00 hinata syslog-ng[1663]: Configuration reload finished;Sep 21 00:08:10 hinata (da11:mpr0:0:26:0): READ(16). CDB: 88 00 00 00 00 01 79 5
a 75 d0 00 00 08 00 00 00
Sep 21 00:08:10 hinata (da11:mpr0:0:26:0): CAM status: SCSI Status ErrorSep 21 00:08:10 hinata (da11:mpr0:0:26:0): SCSI status: Check ConditionSep 21 00:08:10 hinata (da11:mpr0:0:26:0): SCSI sense: ABORTED COMMAND asc:4b,3
(ACK/NAK timeout)
Sep 21 00:08:10 hinata (da11:mpr0:0:26:0): Command Specific Info: 0
Sep 21 00:08:10 hinata (da11:mpr0:0:26:0): Descriptor 0x80: f5 51
Sep 21 00:08:10 hinata (da11:mpr0:0:26:0): Descriptor 0x81: 00 00 00 00 00 00Sep 21 00:08:10 hinata (da11:mpr0:0:26:0): Retrying command (per sense data)Sep 21 00:21:44 hinata (da7:mpr0:0:22:0): READ(10). CDB: 28 00 a3 90 c8 d8 00 08
 00 00
Sep 21 00:21:44 hinata (da7:mpr0:0:22:0): CAM status: SCSI Status ErrorSep 21 00:21:44 hinata (da7:mpr0:0:22:0): SCSI status: Check ConditionSep 21 00:21:44 hinata (da7:mpr0:0:22:0): SCSI sense: ABORTED COMMAND asc:4b,4 (
NAK received)
Sep 21 00:21:44 hinata (da7:mpr0:0:22:0): Command Specific Info: 0
Sep 21 00:21:44 hinata (da7:mpr0:0:22:0): Descriptor 0x80: f5 50
Sep 21 00:21:44 hinata (da7:mpr0:0:22:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 00:21:44 hinata (da7:mpr0:0:22:0): Retrying command (per sense data)
Sep 21 00:31:52 hinata (da10:mpr0:0:25:0): READ(16). CDB: 88 00 00 00 00 01 2c be bd 10 00 00 08 00 00 00
Sep 21 00:31:52 hinata (da10:mpr0:0:25:0): CAM status: SCSI Status Error
Sep 21 00:31:52 hinata (da10:mpr0:0:25:0): SCSI status: Check Condition
Sep 21 00:31:52 hinata (da10:mpr0:0:25:0): SCSI sense: ABORTED COMMAND asc:4b,4(NAK received)
Sep 21 00:31:52 hinata (da10:mpr0:0:25:0): Command Specific Info: 0
Sep 21 00:31:52 hinata (da10:mpr0:0:25:0): Descriptor 0x80: f5 50
Sep 21 00:31:52 hinata (da10:mpr0:0:25:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 00:31:52 hinata (da10:mpr0:0:25:0): Retrying command (per sense data)
Sep 21 00:35:48 hinata (da8:mpr0:0:23:0): READ(10). CDB: 28 00 a1 0b 8b c0 00 01 00 00
Sep 21 00:35:48 hinata (da8:mpr0:0:23:0): CAM status: SCSI Status Error
Sep 21 00:35:48 hinata (da8:mpr0:0:23:0): SCSI status: Check Condition
Sep 21 00:35:48 hinata (da8:mpr0:0:23:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Sep 21 00:35:48 hinata (da8:mpr0:0:23:0): Command Specific Info: 0
Sep 21 00:35:48 hinata (da8:mpr0:0:23:0): Descriptor 0x80: f5 50
Sep 21 00:35:48 hinata (da8:mpr0:0:23:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 00:35:48 hinata (da8:mpr0:0:23:0): Retrying command (per sense data)
Sep 21 00:39:05 hinata (da8:mpr0:0:23:0): READ(16). CDB: 88 00 00 00 00 01 01 a1 2b 98 00 00 08 00 00 00
Sep 21 00:39:05 hinata (da8:mpr0:0:23:0): CAM status: SCSI Status Error
Sep 21 00:39:05 hinata (da8:mpr0:0:23:0): SCSI status: Check Condition
Sep 21 00:39:05 hinata (da8:mpr0:0:23:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Sep 21 00:39:05 hinata (da8:mpr0:0:23:0): Command Specific Info: 0
Sep 21 00:39:05 hinata (da8:mpr0:0:23:0): Descriptor 0x80: f5 50
Sep 21 00:39:05 hinata (da8:mpr0:0:23:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 00:39:05 hinata (da8:mpr0:0:23:0): Retrying command (per sense data)
Sep 21 01:36:32 hinata (da0:mpr0:0:15:0): READ(10). CDB: 28 00 4a c5 96 60 00 08 00 00
Sep 21 01:36:32 hinata (da0:mpr0:0:15:0): CAM status: SCSI Status Error
Sep 21 01:36:32 hinata (da0:mpr0:0:15:0): SCSI status: Check Condition
Sep 21 01:36:32 hinata (da0:mpr0:0:15:0): SCSI sense: ABORTED COMMAND asc:4b,3 (ACK/NAK timeout)
Sep 21 01:36:32 hinata (da0:mpr0:0:15:0): Command Specific Info: 0
Sep 21 01:36:32 hinata (da0:mpr0:0:15:0): Descriptor 0x80: f5 51
Sep 21 01:36:32 hinata (da0:mpr0:0:15:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 01:36:32 hinata (da0:mpr0:0:15:0): Retrying command (per sense data)
Sep 21 01:43:43 hinata (da10:mpr0:0:25:0): READ(10). CDB: 28 00 97 a8 fc d0 00 08 00 00
Sep 21 01:43:43 hinata (da10:mpr0:0:25:0): CAM status: SCSI Status Error
Sep 21 01:43:43 hinata (da10:mpr0:0:25:0): SCSI status: Check Condition
Sep 21 01:43:43 hinata (da10:mpr0:0:25:0): SCSI sense: ABORTED COMMAND asc:4b,4(NAK received)
Sep 21 01:43:43 hinata (da10:mpr0:0:25:0): Command Specific Info: 0
Sep 21 01:43:43 hinata (da10:mpr0:0:25:0): Descriptor 0x80: f5 50
Sep 21 01:43:43 hinata (da10:mpr0:0:25:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 01:43:43 hinata (da10:mpr0:0:25:0): Retrying command (per sense data)
Sep 21 02:50:46 hinata (da7:mpr0:0:22:0): READ(16). CDB: 88 00 00 00 00 01 66 f8 f8 c0 00 00 08 00 00 00
Sep 21 02:50:46 hinata (da7:mpr0:0:22:0): CAM status: SCSI Status Error
Sep 21 02:50:46 hinata (da7:mpr0:0:22:0): SCSI status: Check Condition
Sep 21 02:50:46 hinata (da7:mpr0:0:22:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Sep 21 02:50:46 hinata (da7:mpr0:0:22:0): Command Specific Info: 0
Sep 21 02:50:46 hinata (da7:mpr0:0:22:0): Descriptor 0x80: f5 50
Sep 21 02:50:46 hinata (da7:mpr0:0:22:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 02:50:46 hinata (da7:mpr0:0:22:0): Retrying command (per sense data)
Sep 21 02:58:49 hinata (da8:mpr0:0:23:0): READ(16). CDB: 88 00 00 00 00 01 2c fd 3a 80 00 00 08 00 00 00
Sep 21 02:58:49 hinata (da8:mpr0:0:23:0): CAM status: SCSI Status Error
Sep 21 02:58:49 hinata (da8:mpr0:0:23:0): SCSI status: Check Condition
Sep 21 02:58:49 hinata (da8:mpr0:0:23:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Sep 21 02:58:49 hinata (da8:mpr0:0:23:0): Command Specific Info: 0
Sep 21 02:58:49 hinata (da8:mpr0:0:23:0): Descriptor 0x80: f5 50
Sep 21 02:58:49 hinata (da8:mpr0:0:23:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 02:58:49 hinata (da8:mpr0:0:23:0): Retrying command (per sense data)
Sep 21 03:16:12 hinata (da11:mpr0:0:26:0): READ(10). CDB: 28 00 82 e0 17 78 00 08 00 00
Sep 21 03:16:12 hinata (da11:mpr0:0:26:0): CAM status: SCSI Status Error
Sep 21 03:16:12 hinata (da11:mpr0:0:26:0): SCSI status: Check Condition
Sep 21 03:16:12 hinata (da11:mpr0:0:26:0): SCSI sense: ABORTED COMMAND asc:4b,4(NAK received)
Sep 21 03:16:12 hinata (da11:mpr0:0:26:0): Command Specific Info: 0
Sep 21 03:16:12 hinata (da11:mpr0:0:26:0): Descriptor 0x80: f5 50
Sep 21 03:16:12 hinata (da11:mpr0:0:26:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 03:16:12 hinata (da11:mpr0:0:26:0): Retrying command (per sense data)
Sep 21 03:26:37 hinata (da7:mpr0:0:22:0): READ(16). CDB: 88 00 00 00 00 01 5d c1 b2 48 00 00 08 00 00 00
Sep 21 03:26:37 hinata (da7:mpr0:0:22:0): CAM status: SCSI Status Error
Sep 21 03:26:37 hinata (da7:mpr0:0:22:0): SCSI status: Check Condition
Sep 21 03:26:37 hinata (da7:mpr0:0:22:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Sep 21 03:26:37 hinata (da7:mpr0:0:22:0): Command Specific Info: 0
Sep 21 03:26:37 hinata (da7:mpr0:0:22:0): Descriptor 0x80: f5 50
Sep 21 03:26:37 hinata (da7:mpr0:0:22:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 03:26:37 hinata (da7:mpr0:0:22:0): Retrying command (per sense data)
Sep 21 04:15:57 hinata (da6:mpr0:0:21:0): READ(10). CDB: 28 00 13 cc dc c8 00 08 00 00
Sep 21 04:15:57 hinata (da6:mpr0:0:21:0): CAM status: SCSI Status Error
Sep 21 04:15:57 hinata (da6:mpr0:0:21:0): SCSI status: Check Condition
Sep 21 04:15:57 hinata (da6:mpr0:0:21:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Sep 21 04:15:57 hinata (da6:mpr0:0:21:0): Command Specific Info: 0
Sep 21 04:15:57 hinata (da6:mpr0:0:21:0): Descriptor 0x80: f5 50
Sep 21 04:15:57 hinata (da6:mpr0:0:21:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 04:15:57 hinata (da6:mpr0:0:21:0): Retrying command (per sense data)
Sep 21 04:30:42 hinata (da2:mpr0:0:17:0): READ(16). CDB: 88 00 00 00 00 01 67 f8 bf 30 00 00 08 00 00 00
Sep 21 04:30:42 hinata (da2:mpr0:0:17:0): CAM status: SCSI Status Error
Sep 21 04:30:42 hinata (da2:mpr0:0:17:0): SCSI status: Check Condition
Sep 21 04:30:42 hinata (da2:mpr0:0:17:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Sep 21 04:30:42 hinata (da2:mpr0:0:17:0): Command Specific Info: 0
Sep 21 04:30:42 hinata (da2:mpr0:0:17:0): Descriptor 0x80: f5 50
Sep 21 04:30:42 hinata (da2:mpr0:0:17:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 04:30:42 hinata (da2:mpr0:0:17:0): Retrying command (per sense data)
Sep 21 05:16:36 hinata (da2:mpr0:0:17:0): READ(16). CDB: 88 00 00 00 00 01 0c 1c 7f a8 00 00 08 00 00 00
Sep 21 05:16:36 hinata (da2:mpr0:0:17:0): CAM status: SCSI Status Error
Sep 21 05:16:36 hinata (da2:mpr0:0:17:0): SCSI status: Check Condition
Sep 21 05:16:36 hinata (da2:mpr0:0:17:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Sep 21 05:16:36 hinata (da2:mpr0:0:17:0): Command Specific Info: 0
Sep 21 05:16:36 hinata (da2:mpr0:0:17:0): Descriptor 0x80: f5 50
Sep 21 05:16:36 hinata (da2:mpr0:0:17:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 05:16:36 hinata (da2:mpr0:0:17:0): Retrying command (per sense data)
Sep 21 05:23:16 hinata (da1:mpr0:0:16:0): READ(16). CDB: 88 00 00 00 00 01 23 d4 74 40 00 00 08 00 00 00
Sep 21 05:23:16 hinata (da1:mpr0:0:16:0): CAM status: SCSI Status Error
Sep 21 05:23:16 hinata (da1:mpr0:0:16:0): SCSI status: Check Condition
Sep 21 05:23:16 hinata (da1:mpr0:0:16:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Sep 21 05:23:16 hinata (da1:mpr0:0:16:0): Command Specific Info: 0
Sep 21 05:23:16 hinata (da1:mpr0:0:16:0): Descriptor 0x80: f5 50
Sep 21 05:23:16 hinata (da1:mpr0:0:16:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 05:23:16 hinata (da1:mpr0:0:16:0): Retrying command (per sense data)
Sep 21 05:40:37 hinata (da2:mpr0:0:17:0): READ(16). CDB: 88 00 00 00 00 01 70 dd 6f 38 00 00 08 00 00 00
Sep 21 05:40:37 hinata (da2:mpr0:0:17:0): CAM status: SCSI Status Error
Sep 21 05:40:37 hinata (da2:mpr0:0:17:0): SCSI status: Check Condition
Sep 21 05:40:37 hinata (da2:mpr0:0:17:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Sep 21 05:40:37 hinata (da2:mpr0:0:17:0): Command Specific Info: 0
Sep 21 05:40:37 hinata (da2:mpr0:0:17:0): Descriptor 0x80: f5 50
Sep 21 05:40:37 hinata (da2:mpr0:0:17:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 05:40:37 hinata (da2:mpr0:0:17:0): Retrying command (per sense data)
Sep 21 05:56:20 hinata (da10:mpr0:0:25:0): READ(16). CDB: 88 00 00 00 00 01 a0 1e 91 98 00 00 08 00 00 00
Sep 21 05:56:20 hinata (da10:mpr0:0:25:0): CAM status: SCSI Status Error
Sep 21 05:56:20 hinata (da10:mpr0:0:25:0): SCSI status: Check Condition
Sep 21 05:56:20 hinata (da10:mpr0:0:25:0): SCSI sense: ABORTED COMMAND asc:4b,4(NAK received)
Sep 21 05:56:20 hinata (da10:mpr0:0:25:0): Command Specific Info: 0
Sep 21 05:56:20 hinata (da10:mpr0:0:25:0): Descriptor 0x80: f5 50
Sep 21 05:56:20 hinata (da10:mpr0:0:25:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 05:56:20 hinata (da10:mpr0:0:25:0): Retrying command (per sense data)
Sep 21 06:25:12 hinata (da5:mpr0:0:20:0): READ(10). CDB: 28 00 d2 ae 6c 60 00 00 a0 00
Sep 21 06:25:12 hinata (da5:mpr0:0:20:0): CAM status: SCSI Status Error
Sep 21 06:25:12 hinata (da5:mpr0:0:20:0): SCSI status: Check Condition
Sep 21 06:25:12 hinata (da5:mpr0:0:20:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Sep 21 06:25:12 hinata (da5:mpr0:0:20:0): Command Specific Info: 0
Sep 21 06:25:12 hinata (da5:mpr0:0:20:0): Descriptor 0x80: f5 50
Sep 21 06:25:12 hinata (da5:mpr0:0:20:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 06:25:12 hinata (da5:mpr0:0:20:0): Retrying command (per sense data)
Sep 21 07:35:44 hinata 1 2022-09-21T11:35:44.179287+00:00 hinata.lan ctld 14350- - 10.0.10.10: invalid target name "andromeda"; should start with either "iqn.", "eui.", or "naa."
Sep 21 07:35:45 hinata 1 2022-09-21T11:35:45.182762+00:00 hinata.lan ctld 14351- - 10.0.10.10: invalid target name "andromeda"; should start with either "iqn.", "eui.", or "naa."
Sep 21 07:35:46 hinata 1 2022-09-21T11:35:46.703310+00:00 hinata.lan ctld 14352- - 10.0.10.10: invalid target name "andromeda"; should start with either "iqn.", "eui.", or "naa."
Sep 21 08:36:52 hinata (da7:mpr0:0:22:0): READ(10). CDB: 28 00 76 37 f5 b0 00 01 00 00
Sep 21 08:36:52 hinata (da7:mpr0:0:22:0): CAM status: SCSI Status Error
Sep 21 08:36:52 hinata (da7:mpr0:0:22:0): SCSI status: Check Condition
Sep 21 08:36:52 hinata (da7:mpr0:0:22:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Sep 21 08:36:52 hinata (da7:mpr0:0:22:0): Command Specific Info: 0
Sep 21 08:36:52 hinata (da7:mpr0:0:22:0): Descriptor 0x80: f5 50
Sep 21 08:36:52 hinata (da7:mpr0:0:22:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 21 08:36:52 hinata (da7:mpr0:0:22:0): Retrying command (per sense data)
Sep 22 00:00:00 hinata syslog-ng[1663]: Configuration reload request received, reloading configuration;
Sep 22 00:00:00 hinata syslog-ng[1663]: Configuration reload finished;
Sep 23 00:00:00 hinata syslog-ng[1663]: Configuration reload request received, reloading configuration;
Sep 23 00:00:00 hinata syslog-ng[1663]: Configuration reload finished;
Sep 24 00:00:00 hinata syslog-ng[1663]: Configuration reload request received, reloading configuration;
Sep 24 00:00:00 hinata syslog-ng[1663]: Configuration reload finished;
Sep 25 00:00:00 hinata syslog-ng[1663]: Configuration reload request received, reloading configuration;
Sep 25 00:00:00 hinata syslog-ng[1663]: Configuration reload finished;
Sep 25 11:53:26 hinata WARNING: 10.0.10.10 (andromeda): no ping reply (NOP-Out)after 5 seconds; dropping connection
Sep 25 11:54:25 hinata WARNING[1663]: Last message '10.0.10.10 (andromed' repeated 1 times, suppressed by syslog-ng on hinata.lan
Sep 25 12:18:28 hinata 1 2022-09-25T16:18:28.400386+00:00 hinata.lan ctld 7877 - - 10.0.10.10: invalid target name "andromeda"; should start with either "iqn.", "eui.", or "naa."
Sep 25 12:18:29 hinata 1 2022-09-25T16:18:29.403051+00:00 hinata.lan ctld 7878 - - 10.0.10.10: invalid target name "andromeda"; should start with either "iqn.", "eui.", or "naa."
Sep 25 12:18:30 hinata 1 2022-09-25T16:18:30.928443+00:00 hinata.lan ctld 7879 - - 10.0.10.10: invalid target name "andromeda"; should start with either "iqn.", "eui.", or "naa."
Sep 25 13:59:26 hinata WARNING: 10.0.10.10 (andromeda): no ping reply (NOP-Out)after 5 seconds; dropping connection
Sep 25 14:00:25 hinata WARNING[1663]: Last message '10.0.10.10 (andromed' repeated 1 times, suppressed by syslog-ng on hinata.lan
Sep 25 14:42:19 hinata 1 2022-09-25T18:42:19.208664+00:00 hinata.lan ctld 10039- - 10.0.10.10: invalid target name "andromeda"; should start with either "iqn.", "eui.", or "naa."
Sep 25 14:42:20 hinata 1 2022-09-25T18:42:20.216649+00:00 hinata.lan ctld 10040- - 10.0.10.10: invalid target name "andromeda"; should start with either "iqn.", "eui.", or "naa."
Sep 25 14:42:20 hinata 1 2022-09-25T18:42:20.222651+00:00 hinata.lan ctld 10041- - 10.0.10.10: invalid target name "andromeda"; should start with either "iqn.", "eui.", or "naa."
Sep 25 14:47:11 hinata WARNING: 10.0.10.10 (andromeda): no ping reply (NOP-Out)after 5 seconds; dropping connection
Sep 25 14:48:12 hinata WARNING[1663]: Last message '10.0.10.10 (andromed' repeated 1 times, suppressed by syslog-ng on hinata.lan
Sep 25 16:37:16 hinata 1 2022-09-25T20:37:16.641270+00:00 hinata.lan ctld 11771- - 10.0.10.10: invalid target name "andromeda"; should start with either "iqn.", "eui.", or "naa."
Sep 25 16:37:17 hinata 1 2022-09-25T20:37:17.629774+00:00 hinata.lan ctld 11772- - 10.0.10.10: invalid target name "andromeda"; should start with either "iqn.", "eui.", or "naa."
Sep 25 16:37:18 hinata 1 2022-09-25T20:37:18.650490+00:00 hinata.lan ctld 11773- - 10.0.10.10: invalid target name "andromeda"; should start with either "iqn.", "eui.", or "naa."
Sep 25 16:51:07 hinata (da0:mpr0:0:15:0): READ(10). CDB: 28 00 32 93 b8 d8 00 01 00 00
Sep 25 16:51:07 hinata (da0:mpr0:0:15:0): CAM status: SCSI Status Error
Sep 25 16:51:07 hinata (da0:mpr0:0:15:0): SCSI status: Check Condition
Sep 25 16:51:07 hinata (da0:mpr0:0:15:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
Sep 25 16:51:07 hinata (da0:mpr0:0:15:0): Command Specific Info: 0
Sep 25 16:51:07 hinata (da0:mpr0:0:15:0): Descriptor 0x80: f5 50
Sep 25 16:51:07 hinata (da0:mpr0:0:15:0): Descriptor 0x81: 00 00 00 00 00 00
Sep 25 16:51:07 hinata (da0:mpr0:0:15:0): Retrying command (per sense data)
Sep 26 00:00:00 hinata syslog-ng[1663]: Configuration reload request received, reloading configuration;
Sep 26 00:00:00 hinata syslog-ng[1663]: Configuration reload finished;
Sep 26 03:14:46 hinata 1 2022-09-26T03:14:46.550382-04:00 hinata.lan smartd 2072 - - Device: /dev/da14, not capable of Long Self-Test
Sep 26 03:14:46 hinata 1 2022-09-26T03:14:46.556033-04:00 hinata.lan smartd 2072 - - Device: /dev/da15, not capable of Long Self-Test
Sep 26 03:14:46 hinata 1 2022-09-26T03:14:46.560808-04:00 hinata.lan smartd 2072 - - Device: /dev/da9, not capable of Long Self-Test
Sep 26 03:14:46 hinata 1 2022-09-26T03:14:46.565485-04:00 hinata.lan smartd 2072 - - Device: /dev/da13, not capable of Long Self-Test
Sep 26 03:14:46 hinata 1 2022-09-26T03:14:46.570851-04:00 hinata.lan smartd 2072 - - Device: /dev/da12, not capable of Long Self-Test
Sep 27 00:00:00 hinata syslog-ng[1663]: Configuration reload request received, reloading configuration;
Sep 27 00:00:00 hinata syslog-ng[1663]: Configuration reload finished;
 

Attachments

  • 1636585885953.jpg
    1636585885953.jpg
    36.7 KB · Views: 148

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Code:
   0
# zpool status -v
  pool: PrimaryPool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 128K in 07:53:38 with 0 errors on Wed Sep 21 07:16:01 202
2
config:

        NAME                                              STATE     READ WRITE C
KSUM
        PrimaryPool                                       DEGRADED     0     0
   0
          mirror-0                                        ONLINE       0     0
   0            gptid/d7476d46-32ca-11ec-b815-002590f52cc2    ONLINE       0     0
   0
            gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2    ONLINE       0     0
   0
          mirror-1                                        ONLINE       0     0
   0
            gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/db71bcb5-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-2                                        ONLINE       0     0   0
            gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/d96847a9-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-3                                        ONLINE       0     0   0
            gptid/d9fb7757-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/da1e1121-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-4                                        ONLINE       0     0   0
            gptid/9fd0872d-8f64-11ec-8462-002590f52cc2    ONLINE       0     0   0
            gptid/9ff0f041-8f64-11ec-8462-002590f52cc2    ONLINE       0     0   0
          mirror-5                                        DEGRADED     0     0   0
            gptid/14811777-1b6d-11ed-8423-ac1f6be66d76    ONLINE       0     0   0
            spare-1                                       DEGRADED     0     0   0
              gptid/1471d6e5-1b6d-11ed-8423-ac1f6be66d76  DEGRADED     0     0   1  too many errors
              gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   1
          mirror-6                                        ONLINE       0     0   0
            gptid/47050bdd-1b6d-11ed-8423-ac1f6be66d76    ONLINE       0     0   0
            spare-1                                       ONLINE       0     0   0
              gptid/4710dd39-1b6d-11ed-8423-ac1f6be66d76  ONLINE       0     0   0
              gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0
        spares
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:21 with 0 errors on Mon Sep 26 03:45:21 2022
config:
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
I am unfourtunately still having troubles with this.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Please post your full hardware setup if you want some help
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Motherboard: H11SSLi
Chassis: Supermicro 6047R-E1R36L, w SAS3 Backplane
HBA: AOC-S3008L-L8E Rev2 (IT MODE)
AMD EPYC 7401P
64GB ECC DDR4
Mirrored VDEVs SAS3 HDD
Boot Drive SSD
TrueNAS-12.0-U8.1
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Hmmm, not sure what I would do with this. Possibly pull the degraded drives and test them in another machine.
1 Chksum error shouldn't cause a drive to be kicked out.

I would suggest carefully reseating all cables (Power and data) to the backplane as chksum errors are often (but nopt always) caused by cabling issues
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
I unfortunately have no way to test these drives in another machine.
I guess I can try swapping one of the drives to a different bay. Will I have to rerun a scrub for it to see if it follows? Or will it automatically present itself when I move it?

Also yeah maybe one of the cables got screwed up to the HBA.
I am curious though, wouldn't this cause more than one drive to have issues if it was an HBA cable?

And once more, I am still confused why the two spare drives show "UNAVAILABLE"?
And then one drive shows "DEGRADED"
And why in the picture, it shows the degraded one under a SPARE dropdown which is nested under an actual mirror vdev?
I only have 2 drives in my system that should be acting as spares. Not 4.
So I'm very confused on how it's presenting itself.
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
I guess I can try swapping one of the drives to a different bay. Will I have to rerun a scrub for it to see if it follows? Or will it automatically present itself when I move it?

Keep in mind that if you suspect the bay is bad, you only move the drive out of the suspect bay. If you move the good drive into a suspect bay as part of the "swap", you risk losing one more drive. So this only works if you have a spare unused drive bay. Then, if the spare bay is also bad (say, some failure common to the cabling), there is no risk of making things worse.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Good point, I will swap to an unused bay.
Havent had a lot of time lately, will check things out when I get some time to myself.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Looking this over now that I have a little time, I'm still a bit confused which drive I'm moving.
I don't understand, I only have 2 spares setup in this system.

It shows mirror 5, and it shows spare 1 is degraded.
But it lists 2 drives?
And it also shows 2 spares at the very bottom, and are listed as currently in use.

I just am failing to understand how it is displaying it, and which drive I need to move?

If I had to take a guess, I have to move gptid/1471d6e5-1b6d-11ed-8423-ac1f6be66d76 DEGRADED 0 0 1 too many errors
But why is it listed as a spare? Is it saying one drive died, it put a spare into that slot, and then that spare one is also having issues? That doesn't seem right to me.



Code:
gptid/da1e1121-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-4                                        ONLINE       0     0   0
            gptid/9fd0872d-8f64-11ec-8462-002590f52cc2    ONLINE       0     0   0
            gptid/9ff0f041-8f64-11ec-8462-002590f52cc2    ONLINE       0     0   0
          mirror-5                                        DEGRADED     0     0   0
            gptid/14811777-1b6d-11ed-8423-ac1f6be66d76    ONLINE       0     0   0
            spare-1                                       DEGRADED     0     0   0
              gptid/1471d6e5-1b6d-11ed-8423-ac1f6be66d76  DEGRADED     0     0   1  too many errors
              gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   1
          mirror-6                                        ONLINE       0     0   0
            gptid/47050bdd-1b6d-11ed-8423-ac1f6be66d76    ONLINE       0     0   0
            spare-1                                       ONLINE       0     0   0
              gptid/4710dd39-1b6d-11ed-8423-ac1f6be66d76  ONLINE       0     0   0
              gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0

          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use

 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
It's more or less obvious. The 2 spares are "currently in use", so those 2 were mirrored against a failing drive. For some reason mirror-6 is fine, so I would ignore it for the moment.

Replace the DEGRADED item is "mirror-5", "spare-1", which is NOT your spare.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Mirror implies 2 drives. Are you saying both drives in mirror 5 failed?
If 1 drive fails in a mirror, I would expect one spare to fill it. If another drive in another mirror or the same fails, I would expect the other drive in the spare to fill in or that failed.
Is this not how it works?

Anyways, my question still remains because of my confusion.
In mirror-5, spare-1, it is listing TWO drives. Plus one directly above it outside of "spare-1" Why? It makes it seem like there are 3 drives in mirror-5, when there only are actually 2.

And can you point out the gptid of the failed drive so I can maybe understand better?
Not sure why this is so confusing to me, thank you.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
No, I am NOT saying both drives in "mirror-5" failed. What ZFS does for hot sparing, is create a temporary sub-mirror of the failing drive, in this case "spare-1":
Code:
          mirror-5                                        DEGRADED     0     0   0
            gptid/14811777-1b6d-11ed-8423-ac1f6be66d76    ONLINE       0     0   0
            spare-1                                       DEGRADED     0     0   0
              gptid/1471d6e5-1b6d-11ed-8423-ac1f6be66d76  DEGRADED     0     0   1  too many errors
              gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   1

The failing disk is the second to last line above. It is clearly listed as DEGRADED. Your spare is the bottom line, which I can tell because you gave me the GPTIDs of the 2 spares. And this spare is listed as ONLINE, meaning GOOD.

As I said, ZFS sparing creates a sub-mirror of the failing drive and the spare drive. It may appear confusing, but the intent is to allow you to either replace the failing disk, (see manual for how to do that). Or remove the failed disk, causing the hot spare to become a (semi-)permanent part of your pool.

You have the GPTID of the 2 spares from your prior listing:
Code:
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use

Those show also show you which drives are the spares in the "mirror-5" and "mirror-6". It also helps they are ONLINE, meaning GOOD, in the "mirror-5" and "mirror-6".
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Mirror implies 2 drives.

It most certainly does not. Mirrors are a minimum of two drives, but can be three or four (or more). Even a single disk shares characteristics with a mirror in that a second drive can be attached alongside it, converting it to a two way mirror.
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Ok so I took a gander and presumed it was simply the drive that failed, and not an issue with the backplane or cables.
I powered down the server, took the old drive out, and popped in a fresh drive and booted up.
I do not see it resilvering. In the web gui it shows that disk as N/A under the "Pool" category.

And it seems to notice the gptid has changed though.
What step did I miss for getting it to be added into the pool and resilver? Do I just need to run a scrub or something?

Code:
#zpool status -v
  pool: PrimaryPool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q
  scan: scrub repaired 0B in 08:02:06 with 0 errors on Mon Oct 10 11:02:07 2022
config:

        NAME                                              STATE     READ WRITE CKSUM
        PrimaryPool                                       DEGRADED     0     0   0
          mirror-0                                        ONLINE       0     0   0
            gptid/d7476d46-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/d8d6aa36-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-1                                        ONLINE       0     0   0
            gptid/d9a6f5dc-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/db71bcb5-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-2                                        ONLINE       0     0   0
            gptid/d8b2f42f-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/d96847a9-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-3                                        ONLINE       0     0   0
            gptid/d9fb7757-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
            gptid/da1e1121-32ca-11ec-b815-002590f52cc2    ONLINE       0     0   0
          mirror-4                                        ONLINE       0     0   0
            gptid/9fd0872d-8f64-11ec-8462-002590f52cc2    ONLINE       0     0   0
            gptid/9ff0f041-8f64-11ec-8462-002590f52cc2    ONLINE       0     0   0
          mirror-5                                        DEGRADED     0     0   0
            gptid/14811777-1b6d-11ed-8423-ac1f6be66d76    ONLINE       0     0   0
            spare-1                                       DEGRADED     0     0   0
              495338994010974698                          UNAVAIL      0     0   0  was /dev/gptid/1471d6e5-1b6d-11ed-8423-ac1f6be66d76
              gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0
          mirror-6                                        ONLINE       0     0   0
            gptid/47050bdd-1b6d-11ed-8423-ac1f6be66d76    ONLINE       0     0   0
            spare-1                                       ONLINE       0     0   0
              gptid/4710dd39-1b6d-11ed-8423-ac1f6be66d76  ONLINE       0     0   0
              gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76  ONLINE       0     0   0
        spares
          gptid/0d48d4ab-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use
          gptid/0d56b97d-1e91-11ed-a6aa-ac1f6be66d76      INUSE     currently in use

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:
 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
Also noticed new notification

Code:

CRITICAL​

Pool PrimaryPool state is DEGRADED: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. The following devices are not healthy:
  • Disk HITACHI HUS72604CLAR4000 K4K7KU5B is UNAVAIL
  • Disk HITACHI HUS72604CLAR4000 K4K6EM8B is UNAVAIL
  • Disk HITACHI HUS72604CLAR4000 K4K9SNYB is UNAVAIL

 

isopropyl

Contributor
Joined
Jan 29, 2022
Messages
159
It most certainly does not. Mirrors are a minimum of two drives, but can be three or four (or more). Even a single disk shares characteristics with a mirror in that a second drive can be attached alongside it, converting it to a two way mirror.
Yeah I understand that, I meant it implied minimum of two drives sorry. Had typed it up quickly.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
TrueNAS & it's implementation of ZFS does not magically start re-silvering when a disk in a slot is changed out. Use the instructions in the manual for disk replacement.


There can be other reasons why a new disk is inserted. Further, ZFS does not really care which slot a pool member is in. In fact, a disk can change both slot and even controller.

To be fair, this does make things clumsy at times. I've worked with some hardware RAID controllers that just recognized a replacement disk, and if big enough, started re-syncing immediately.
 
Top