New build thats driving my crazy. SSD 870 QVO 8TB

McVit

Dabbler
Joined
Sep 20, 2014
Messages
18
Hello guys!

This is my setup:

  • Supermicro 846 Chassi BPN-SAS2-846EL1
  • Asus x670E Creator
  • Amd Ryzen 7900
  • 4x32GB kingston 5600mhz cl40
  • 870QVO, x8, MZ-77Q8T0BW, RaidZ1
  • Toshiba 512MB 18TB, x12, MG09ACA18TE, RaidZ2
  • Broadcom 9207-8i SAS2308 6G SATA SAS HBA PCIe x8 3.0 LSI RAID IT Dell 0VGXKD
  • Onboard 10gbps

i finally bit the bullet and built a large NAS. everything has been going great until i just about finished the data migration from my old storage solution:

1688987568979.png


this is the SSD raid and this is the second time i install this setup and get the same result, lots of read errors (and only read errors)
I thought maybe it was the raid controller last time so i replaced it with a new one (same model thought)

Is there a known issue with Samsung 870QVO and ZFS?
Iam at a loss here, if there is this many read errors, can i trust the data that has been migrated?
what can i do to troubleshoot?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Is this controller flashed to IT firmware or is it a RAID controller set to "IT mode"? If the latter that might well be the cause of your problems.

While QVO are not optimal I would not expect a fundamental problem, specifically read errors due to the choice of SSDs. A significant drop of the write performance at some point for long continuous write operations is what I would expect, though.
 

McVit

Dabbler
Joined
Sep 20, 2014
Messages
18
Is this controller flashed to IT firmware or is it a RAID controller set to "IT mode"? If the latter that might well be the cause of your problems.

While QVO are not optimal I would not expect a fundamental problem, specifically read errors due to the choice of SSDs. A significant drop of the write performance at some point for long continuous write operations is what I would expect, though.
I have to say i dont know, how do i verify? (i just assumed flashed to IT as the listing listed it as "IT")
1689069551857.png


my thoughts were that these drives would be perfect for media library. write once and a lot of reads
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Check for firmware updates to the Samsung 870QVO MZ-77Q8T0BW. I had a minor problem with my old Samsung 1TB SATA SSD which was solved by a firmware update.
 

McVit

Dabbler
Joined
Sep 20, 2014
Messages
18
New development...

1689069655432.png


All data has been transfered from the old nas so i started a scrub, and the amount of errors are still climbing...

also now 2 drives are faulted in the HDD array aswell.

a lot of errors on 2 different arrays, could my 2nd HBA card also be faulty? or could the sas cables be bad?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
The controller is flashed to IT. So far so good.

Things to check:

* Did you check the SSD firmware as @Arwen suggested?
* Look if there's a newer firmware version for that particular card. I vaguely remember that it needs to be at least 16.something but that depends on the card model.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
What's the date of manufacture on those SSDs? Samsung 870s have been affected by nasty data corruption bugs. Supposedly mostly around late 2021/early 2022 production, but have not seen convincing evidence that the issue was actually resolved.

Example from a 4TB 870 EVO:

Code:
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   089   089   010    -    481
  9 Power_On_Hours          -O--CK   096   096   000    -    16375
 12 Power_Cycle_Count       -O--CK   099   099   000    -    11
177 Wear_Leveling_Count     PO--C-   099   099   000    -    18
179 Used_Rsvd_Blk_Cnt_Tot   PO--C-   089   089   010    -    481
181 Program_Fail_Cnt_Total  -O--CK   100   100   010    -    0
182 Erase_Fail_Count_Total  -O--CK   100   100   010    -    0
183 Runtime_Bad_Block       PO--C-   089   089   010    -    481
187 Uncorrectable_Error_Cnt -O--CK   099   099   000    -    135
190 Airflow_Temperature_Cel -O--CK   070   062   000    -    30
195 ECC_Error_Rate          -O-RC-   199   199   000    -    135
199 CRC_Error_Count         -OSRCK   100   100   000    -    0
235 POR_Recovery_Count      -O--C-   099   099   000    -    8
241 Total_LBAs_Written      -O--CK   099   099   000    -    142867953178


Also a lot of write amplification that I think is internal to the disks. Still, not close to the endurance rating.
 
Last edited:

McVit

Dabbler
Joined
Sep 20, 2014
Messages
18
The controller is flashed to IT. So far so good.

Things to check:

* Did you check the SSD firmware as @Arwen suggested?
* Look if there's a newer firmware version for that particular card. I vaguely remember that it needs to be at least 16.something but that depends on the card model.

Yes, i have checked the SSD firmware, according to samsung it is the latest firmware for the SSD
1689074321883.png

Regarding the firmware of the raidcard, it seems to be the latest firmware (2017) for this card.

i would like to think that there is a single source for all these erros as it affects both the SSD raid and the HDD raid.
Common HW is the HBA card, 2x sas8087 cables and the backplane of the supermicro 846 chassi; BPN-SAS2-846EL1

other then the CPU,RAM and Motherboard ofc.
 

neofusion

Contributor
Joined
Apr 2, 2022
Messages
159
New development...

View attachment 68220

All data has been transfered from the old nas so i started a scrub, and the amount of errors are still climbing...

also now 2 drives are faulted in the HDD array aswell.

a lot of errors on 2 different arrays, could my 2nd HBA card also be faulty? or could the sas cables be bad?
Are the fans working properly? HBA can overheat.
 

systract

Dabbler
Joined
Oct 7, 2022
Messages
32
Have a similar build.

My build is ok with Samsung 980, but had multiple problem with Toshiba.
Also the Onboard 10gbps works only as 1G.
 

McVit

Dabbler
Joined
Sep 20, 2014
Messages
18
Are the fans working properly? HBA can overheat.
i was thinking the same thing, 2nd time around...i have placed an additional 120mm 3000rpm fan right next to the heatsink on the HBA card so it cant be the issue
 

McVit

Dabbler
Joined
Sep 20, 2014
Messages
18
Have a similar build.

My build is ok with Samsung 980, but had multiple problem with Toshiba.
Also the Onboard 10gbps works only as 1G.
oh i see, did you mange to work it out? with the toshibas?

i have had no issues with the 10GB interface, i cant transfer files @ 1GB/s
 

McVit

Dabbler
Joined
Sep 20, 2014
Messages
18
Another update; when the automatic scrub started on the HDD array...This ETA is still increasing.

1689440380020.png



A Long S.M.A.R.T test on the SSD array shows this:
1689440576047.png


i will remove this drive and test it on another system but this could possible be the culprit on that array?
 

systract

Dabbler
Joined
Oct 7, 2022
Messages
32
oh i see, did you mange to work it out? with the toshibas?

i have had no issues with the 10GB interface, i cant transfer files @ 1GB/s
That's great, maybe new version recognized the interface? will try it out.

RMA four out of 8 toshibas, now they are OK.

Check this post:
 

McVit

Dabbler
Joined
Sep 20, 2014
Messages
18
That's great, maybe new version recognized the interface? will try it out.

RMA four out of 8 toshibas, now they are OK.

Check this post:

Thanks! will do some long tests on the Toshibas, just did short test on them and they all passed.

iam running

Version: TrueNAS-SCALE-22.12.3.1

and it worked out of the box with both interfaces @ 2,5Gb and @ 10Gb
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
i was thinking the same thing, 2nd time around...i have placed an additional 120mm 3000rpm fan right next to the heatsink on the HBA card so it cant be the issue
Hah. Based on my limited experience, I suggest putting a fan on top of the HX. Motherboards built for servers expect not only screaming delta fans that can handle significant static pressure, they also expect molded covers installed over the motherboard that direct air flow. Neither is typically present in SOHO cases. A fan on top of the HX can really help here. I presume it has to do with breaking up the boundary layer over the HX vs. hoping for a 120mm to lick the HX sufficiently.

In my A76 Lian Li case, I have a Noctua industrial 120mm fan blowing directly at the CPU / motherboard with a 3" funnel that stops 1/2" from the motherboard. Now the CPU is only 20*C hotter than ambient, the SATADOMs, HBA, and the SLOG run hotter. Both the CPU and the HBA have dedicated fans, I also changed the HX for the CPU from stock fan-less Al to Cu + fan.
 
Last edited:

McVit

Dabbler
Joined
Sep 20, 2014
Messages
18
That's great, maybe new version recognized the interface? will try it out.

RMA four out of 8 toshibas, now they are OK.

Check this post:

sorry, been away.

I see, i ran a long test on all 12 Toshiba drives with every single one reporting no errors.

still 2 drives are degrades and one is faulted due to "to many errors".
 

systract

Dabbler
Joined
Oct 7, 2022
Messages
32
sorry, been away.

I see, i ran a long test on all 12 Toshiba drives with every single one reporting no errors.

still 2 drives are degrades and one is faulted due to "to many errors".

It seems that Toshiba has problem with TrueNAS especially if you have a lot of them in one machine.

Good thing is Toshiba has really good RMA policy, they just refund you so you could buy other brands.

On the other hand, Samsung just repair the fault drive then send it back to you:mad:.
 

neofusion

Contributor
Joined
Apr 2, 2022
Messages
159
It seems that Toshiba has problem with TrueNAS especially if you have a lot of them in one machine.

Good thing is Toshiba has really good RMA policy, they just refund you so you could buy other brands.

On the other hand, Samsung just repair the fault drive then send it back to you:mad:.
If we're bringing up anecdotes; my eight Toshibas have run just fine for a year.
Perhaps some drives mentioned in this thread experienced shipping accidents, or are running into another problem entirely. Bad controller or issues with vibration come to mind.
 

McVit

Dabbler
Joined
Sep 20, 2014
Messages
18
If we're bringing up anecdotes; my eight Toshibas have run just fine for a year.
Perhaps some drives mentioned in this thread experienced shipping accidents, or are running into another problem entirely. Bad controller or issues with vibration come to mind.
i would like to think there is another issue then the Toshiba drives themselves as i get no S.M.A.R.T indications at all...and that the SSD array is also reporting read errors even though i have removed the SSD that had S.M.A.R.T Errors


My troubleshooting so far,

i have replaced a HBA card with another (same model and FW)

i have replaced the 8087 cables that connect the backplane to the HBA

is there a chance that the backplane could be bad?

or is there a potential problem with the HBA (roadcom 9207-8i) and the drives?
 
Top