Resource icon

Troubleshooting disk format warnings in TrueNAS SCALE 7.4.5

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The R740 uses the Gen 13/14 proprietary mezzanine, rather than the internal, locked-down PCIe slot. The HBA330 mini is the go-to solution for all Gen 13 and Gen 14 servers, as well as Gen 15 1S Epyc servers (R6515 and R7515 - the 2S R6525 and R7525 use a newer style).
 

CoolWolf

Dabbler
Joined
Mar 2, 2023
Messages
10
Hello there!
First of all "THANKS a lot!" for setting this article up (and especially to @Daisuke). I have 4 HGST 8TB He SAS drives - all are T10-PI Type 2 protected disks and formatted with 512-byte sectors.

I tried with one of them to remove the protection level and reformat to 4kb sectors.
While I ran the following command:
Code:
sg_format -v -F /dev/sda

the complete server shut down without any apparent reason after about 10% being done. I feared - and still do so - I bricked my device as afterwards I saw several messages about bad sectors and I/O errors (and the start of the debian server took about 15 mins as it seemed trying to access the disk and ran into timeouts due to these I/O errors).

Anyhow - I was able to install HUGO and currently this disk is being formatted with HUGO to 4kb sectors ... lets see if that helps.

Now my question:
For HGST disks which are T10-PI protected, is it possible to remove this protection level and format the disk to 4kb in a single step/command?

The format command of HUGO does have the -p option described as follows:
Code:
-p <protection type>, --protection <protection type> Specify a type of Protection Information (0,1,2,3).

So the command can look like this:
Code:
hugo format --danger-zone --simple-progress -b 4096 -p 0 -g /dev/sda

Can this be confirmed by someone? Would be great ...

THANKS a lot in advance ... and have a great day!
 

CoolWolf

Dabbler
Joined
Mar 2, 2023
Messages
10
The format command of HUGO does have the -p option described as follows:
Code:
-p <protection type>, --protection <protection type> Specify a type of Protection Information (0,1,2,3).

So the command can look like this:
Code:
hugo format --danger-zone --simple-progress -b 4096 -p 0 -g /dev/sda

Can this be confirmed by someone? Would be great ...
Just for the records: the command works like a charm!

I was able to remove the protection level and reformat all 4 HGST drives to native 4kb sectors with the given (single) command (took about 14-15 hours).

THANKS and have a great weekend!
 

R1CH

Cadet
Joined
Jun 4, 2021
Messages
2
I upgraded a Dell R740xd2 today and faced these warnings and indeed, the drives are formatted with type-2 protection.

It runs a PERC H730P Mini (the HBA330 was out of stock) configured in HBA mode, so far everything I've tried has led me to believe it's true HBA - I've been able to access the drives directly with every tool I've used so far.

What are the downsides / risks of keeping the drives in their current mode? I'm also curious what "disabling DIF" means in the kernel log:

Code:
[   10.722578] sd 0:0:0:0: [sda] Disabling DIF Type 2 protection
[   10.723384] sd 0:0:0:0: [sda] 22961717248 512-byte logical blocks: (11.8 TB/10.7 TiB)
[   10.723386] sd 0:0:0:0: [sda] 4096-byte physical blocks
[   10.723633] sd 0:0:0:0: [sda] Write Protect is off
[   10.723635] sd 0:0:0:0: [sda] Mode Sense: d3 00 10 08
[   10.723963] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
 

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
I'm going to assume by HUGO, you mean this (sorry I could only find the web archive and I'm too lazy to scan through the whole thread)
I updated the resource with the correct install instructions, I presumed this is common knowledge:
Code:
$ curl -JLO https://www.truenas.com/community/resources/hugo.203/download
$ unzip ./hugo-7.4.5.zip
$ sudo chmod 0755 /usr/bin/apt*
$ sudo apt-get update
$ sudo apt-get -y install libncurses5 libncurses5-dev
$ sudo dpkg -i ./hugo/v7.4.5/HUGO-7.4.5.x86_64.deb
$ sudo chmod 0644 /usr/bin/apt*
 
Last edited:

Dudenell

Cadet
Joined
Aug 12, 2016
Messages
9
So I ended up running the sg_format in a webui window, not realizing it would timeout after 5 minutes, is there a command to see the current format progress?
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
So I ended up running the sg_format in a webui window, not realizing it would timeout after 5 minutes, is there a command to see the current format progress?
I don't think it's possible, but you could see if it's still running or not though, by logging in another shell session as the same user and typing in ps | grep sg_format. It will list it there if it's still in progress.
 

Dudenell

Cadet
Joined
Aug 12, 2016
Messages
9
I don't think it's possible, but you could see if it's still running or not though, by logging in another shell session as the same user and typing in ps | grep sg_format. It will list it there if it's still in progress.
Yeah that doesn't show anything, however if I run

Code:
time sg_format -v -F /dev/sdb


It outputs the following

Code:
    SEAGATE   ST10000NM0226     KTB5   peripheral_type: disk [0x0]
      PROTECT=1
      << supports protection information>>
      Unit serial number: ZA25GGF10000C8276VLF
      LU name: 5000c50095447da7
    mode sense(10) cdb: [5a 00 01 00 00 00 00 00 fc 00]
mode sense(10):
Descriptor format, current; Sense key: Not Ready
Additional sense: Logical unit not ready, format in progress
  Descriptor type: Sense key specific: Progress indication: 32.38%
  Descriptor type: Field replaceable unit code: 0x0
  Descriptor type: Vendor specific [0x80]
    00 00 00 00 00 00 00 00 00 00 00 00 00 00
MODE SENSE (10) command: Device not ready, type: sense key


Descriptor type: Sense key specific: Progress indication is increasing the % completed

ps output shows nothing but zsh and ps.
 
Last edited:

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
ps output shows nothing but zsh and ps.
Then that means either the format process was killed when the session disconnected or it finished doing so.
 

thorzeen

Cadet
Joined
Dec 16, 2017
Messages
7
So I have 12 2TB SAS HGST netapp drives from 2017 everything was fine until upgrade they have level 2 protection were already sg_format to 512 a year ago anyway...
I offline 1 drive /dev/sda and remove protection.
I re sg_format to 512
I try to online the drive back into the pool and get a FAILED 3 errors
I do a LONG smart scan no errors
I zpool clear maintank
This time i try to replace the drive with this newly formatted drive and get error
Code:
Error: [EFAULT] Disk: 'sda' is incorrectly formatted with Data Integrity Feature (DIF).

This is the output for the drive
Code:
root@truenas[~]# sg_readcap -l /dev/sda
Read Capacity results:
   Protection: prot_en=0, p_type=0, p_i_exponent=0
   Logical block provisioning: lbpme=0, lbprz=0
   Last LBA=3907029167 (0xe8e088af), Number of logical blocks=3907029168
   Logical block length=512 bytes
   Logical blocks per physical block exponent=0
   Lowest aligned LBA=0
Hence:
   Device size: 2000398934016 bytes, 1907729.1 MiB, 2000.40 GB, 2.00 TB

What am I missing doing wrong ?
Thanks for your time
(edit) I am reformating to 512 using openSeaChest since I read it's for SAS in the thread /shrug at 14% ATM.
(edit) If you forget (like me) to "tmux" and the page closes during your format you can use
Code:
sg_format -v /dev/<your drive>
and it will give you the % of your on going format. ( I don't remember reading that in this thread)
(edit) Last post of this thread https://www.truenas.com/community/threads/drives-with-error-data-integrity-feature.106633/page-2 said a reboot worked for him.
After the openSeaChest format I still could not replace drive.
I REBOOTED and I am reslivering.
I am guessing I didn't need to reformat a second time using openSeaChest. I will check on the next one.
You may want to add a REBOOT at end of your branded guide before replacing.

(edit) I just did another HGST SAS Disk, I followed the guide (skipped openSeaChest) and I could NOT replace disk (force) until I rebooted (gave above error). Once rebooted I replaced (using force) the disk and it started. Hope this helps anyone running into this

(LAST EDIT) So I guess I need to work on reading comprehension. As I was formating my drives I became aware that (because my "branded drives" had already been reformated from 520 to 512 last year) I could JUST remove Protection and Reboot and then Replace saving alot of time. I think my confussion came from not understanding my need to reboot when doing the first couple of drives (they would throw errors when trying to replace after removing protection) and I thought removing protection required reformat. That was not the case.
Removing protection on already 512 formated disk only required a reboot before replacing (for me) Did all 12 drives and all is well.
 
Last edited:

xinpig

Cadet
Joined
Mar 18, 2023
Messages
1
Code:
Read Capacity results:
   Protection: prot_en=1, p_type=0, p_i_exponent=0 [type 1 protection]
   Logical block provisioning: lbpme=0, lbprz=0
   Last LBA=19134414847 (0x4747fffff), Number of logical blocks=19134414848
   Logical block length=512 bytes
   Logical blocks per physical block exponent=3 [so physical block length=4096 bytes]
   Lowest aligned LBA=0


=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              H7210A520SUN010T
Revision:             A3Y1
Compliance:           SPC-4
User Capacity:        9,796,820,402,176 bytes [9.79 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Formatted with type 1 protection
8 bytes of protection information per logical block
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca2681515ac
Serial number:        XXXX
Device type:          disk
Transport protocol:   SAS (SPL-3)


I guess I'm confused on what I am supposed to do, even after reading multiple times. I've got 4 of those HGST drives in a z1. Is it okay to just keep running as is, or do I have to turn off the protection and reformat to 4k?
 

thorzeen

Cadet
Joined
Dec 16, 2017
Messages
7
Code:
Read Capacity results:
   Protection: prot_en=1, p_type=0, p_i_exponent=0 [type 1 protection]
   Logical block provisioning: lbpme=0, lbprz=0
   Last LBA=19134414847 (0x4747fffff), Number of logical blocks=19134414848
   Logical block length=512 bytes
   Logical blocks per physical block exponent=3 [so physical block length=4096 bytes]
   Lowest aligned LBA=0


=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              H7210A520SUN010T
Revision:             A3Y1
Compliance:           SPC-4
User Capacity:        9,796,820,402,176 bytes [9.79 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Formatted with type 1 protection
8 bytes of protection information per logical block
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca2681515ac
Serial number:        XXXX
Device type:          disk
Transport protocol:   SAS (SPL-3)


I guess I'm confused on what I am supposed to do, even after reading multiple times. I've got 4 of those HGST drives in a z1. Is it okay to just keep running as is, or do I have to turn off the protection and reformat to 4k?
You will probably want to fix them one at a time. Can you just leave it? My best guess would be.. time will tell?
 

majerus

Contributor
Joined
Dec 21, 2012
Messages
126
So just upgraded from Angelfish to bluefin and have this issue on a pool with Raidz2 that is 12 disks wide. All disks except one shows this error. Couple questions that are not specifically addressed in the post @Daisuke has called out a few times.

What is downside to leaving this unsupported , as my pool seems to be working?

If I leave it like it is will data still be ok, maybe just crappy performance?

For someone that doesn't have the ability to wipe the pool and format all drives I guess I need to offline each disk, format it per the "post" then add back and resilver?

I have been running this setup for a few years at this point, so I am a bit concerened how this problem just now happens to be a problem, (Went from Core, to Scale angelfish, then upgraded to bluefin tonight)
 

thorzeen

Cadet
Joined
Dec 16, 2017
Messages
7
So just upgraded from Angelfish to bluefin and have this issue on a pool with Raidz2 that is 12 disks wide. All disks except one shows this error. Couple questions that are not specifically addressed in the post @Daisuke has called out a few times.

What is downside to leaving this unsupported , as my pool seems to be working?

If I leave it like it is will data still be ok, maybe just crappy performance?

For someone that doesn't have the ability to wipe the pool and format all drives I guess I need to offline each disk, format it per the "post" then add back and resilver?

I have been running this setup for a few years at this point, so I am a bit concerened how this problem just now happens to be a problem, (Went from Core, to Scale angelfish, then upgraded to bluefin tonight)
Not sure what your setup is... I just went through same thing..see post above. For me because I only needed to remove protection it is pretty simple to do (it's all pretty simple really) Time wise is the "thing". A 2TB SAS Drive took about 4 hours to remove protection, and about 4 hours to format (no need to reformat if drive is aready 512 (for sure) or 4096??) and about 3 hours to resliver.

What is downside to leaving this unsupported , as my pool seems to be working?
Not sure ..I changed it because it bothered me and if I could fix it, I wanted it fixed.

If I leave it like it is will data still be ok, maybe just crappy performance?
I'v read all this thread + I don't think anyone really knows...Time will tell?
 

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
If I leave it like it is will data still be ok, maybe just crappy performance?
Nothing to do with performance, but rather how your data is managed. Protection Information bytes can be managed at the host bus adapter (HBA) and HBA driver level, enabling systems that typically don’t support 520-byte sector formats to integrate this higher level of protection. In short, due to nature of TrueNAS data usage and that you use 512-byte sectors, Type 0 is the recommended format, otherwise you will not get any warnings.
 

majerus

Contributor
Joined
Dec 21, 2012
Messages
126
Nothing to do with performance, but rather how your data is managed. Protection Information bytes can be managed at the host bus adapter (HBA) and HBA driver level, enabling systems that typically don’t support 520-byte sector formats to integrate this higher level of protection. In short, due to nature of TrueNAS data usage and that you use 512-byte sectors, Type 0 is the recommended format, otherwise you will not get any warnings.
This says a lot of words without saying anything, I appreciate the help but also am just looking for a clear answer. At the end of the day what is the risk? If your not sure no problem, maybe ill just open a jira ticket.
 

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
This says a lot of words without saying anything, I appreciate the help but also am just looking for a clear answer.
Let me explain better. The protection is used to "enhance" your current disks, in order to talk with or understand additional commands released by your hardware storage box software. Since TrueNAS was not designed for these features, it is recommended to remove the protection. Maybe now you get just a warning but in the future, you might encounter a true error. It is best to comply with the TrueNAS requirements.
 
Last edited:
Top