Massive ZFS IO Errors

k002

Cadet
Joined
Nov 9, 2023
Messages
3
Hello All,


Im pretty new to ZFS an TrueNAS in general. My aim was to build a cheap and energy saving server. Therefore I`m using a Fujitsu Futro S920 Motherboard with 8GB of non ECC RAM. The Mainboard has a Pcie x4 Gen3 slot. I`ve plugged in a PCIe to SATA card for more SATA Ports. I know these things are not ideal but i don`t need mass bandwidth. The Cards i used where based on a Marvel 88SE9215 (PCIe x1 Gen2) and now a ASM1166 (PCIe x4 Gen3) controller. For power the Futro gets its own power from the factory power supply, while the HDDs are getting there own Power from power supply for HDDs with 2,5A on 12V and 5V each. Cables have been changed more than once to eliminate any bad cabling errors. Drives are 2 Seagate (Ironwolf Pro and Exos) + old Laptop Hitachi Drive – all use CMR.

As you see I´ve changed everything except the Mainboard multiple times an still getting IO Erros in Mass with a few read and write errors as well. First it was just one drive throwing errors (Exos) so I went an bought the Ironwolf but now all drives show error Massages and bombarding me with warnings about degraded pools and so on…

Error logs are hopefully coming soon.

My next idea is to try a real PSU for the drives. Mybe the power output of the power supply is faulty?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
There's a lot to go wrong there. The SATA controllers are dubious and the power situation crazy, to the point where I wouldn't rule out physical damage. Are the various grounds tied together with a proper low-impedance path or at least all referenced to mains earth? What are these PSUs exactly?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Ugh. Please read the following resource.

I wouldn't personally reccomend a build like yours.
 

asap2go

Patron
Joined
Jun 11, 2023
Messages
228
For power the Futro gets its own power from the factory power supply, while the HDDs are getting there own Power from power supply for HDDs with 2,5A on 12V and 5V each.
Each as:
in each drive
or only each lane (12V lane gets 2.5A total and 5V lane gets 2.5A total)?
The exos alone needs > 30W on startup and that power needs to be reliable.
Often times specification on dubious electronics are for short bursts only and not for continuous loads.
My next idea is to try a real PSU for the drives. Mybe the power output of the power supply is faulty?
That may be the case and will make every diagnostic really difficult.
I'd recommend getting a decently sized (450W) power supply with a 80+ Gold rating as that usually indicates at least a base level of quality.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Im pretty new to ZFS an TrueNAS in general. My aim was to build a cheap and energy saving server.
Cheap and energy saving is not exactly what ZFS is made for. May I ask what made you choose TrueNAS with those requirements?

If it is YouTube, which I kind-of guess, please be aware that the majority of TrueNAS/ZFS videos out there are simply crap. There are exceptions like Tom Lawrence, but they are a rare species ;-)
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Hello All,


Im pretty new to ZFS an TrueNAS in general. My aim was to build a cheap and energy saving server. Therefore I`m using a Fujitsu Futro S920 Motherboard with 8GB of non ECC RAM. The Mainboard has a Pcie x4 Gen3 slot. I`ve plugged in a PCIe to SATA card for more SATA Ports. I know these things are not ideal but i don`t need mass bandwidth. The Cards i used where based on a Marvel 88SE9215 (PCIe x1 Gen2) and now a ASM1166 (PCIe x4 Gen3) controller. For power the Futro gets its own power from the factory power supply, while the HDDs are getting there own Power from power supply for HDDs with 2,5A on 12V and 5V each. Cables have been changed more than once to eliminate any bad cabling errors. Drives are 2 Seagate (Ironwolf Pro and Exos) + old Laptop Hitachi Drive – all use CMR.

As you see I´ve changed everything except the Mainboard multiple times an still getting IO Erros in Mass with a few read and write errors as well. First it was just one drive throwing errors (Exos) so I went an bought the Ironwolf but now all drives show error Massages and bombarding me with warnings about degraded pools and so on…

Error logs are hopefully coming soon.

My next idea is to try a real PSU for the drives. Mybe the power output of the power supply is faulty?

IO errors usually means bad cabling, bad Controller. I would suggest investing in a HBA something like a LSI-9211-8i or a Dell Perc H200/H310. If you don’t want to dabble in Cross Flashing to IT then just get a LSI-9211-8i. Cabling then requires a SAS Forward Breakout Cable. These can be picked up cheaply on eBay.

If you care about your DATA then I’d replace those SATA Controllers.
 

k002

Cadet
Joined
Nov 9, 2023
Messages
3
There's a lot to go wrong there. The SATA controllers are dubious and the power situation crazy, to the point where I wouldn't rule out physical damage. Are the various grounds tied together with a proper low-impedance path or at least all referenced to mains earth? What are these PSUs exactly?
I`ve changed the Power Supply for the Drives to a real ATX PSU with 500W. So the power situation should be better now. The errors have gone down significantly but still getting to many IO Errors per day. I will post a update with further information.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
I`ve changed the Power Supply for the Drives to a real ATX PSU with 500W. So the power situation should be better now. The errors have gone down significantly but still getting to many IO Errors per day. I will post a update with further information.

Changing the Power Supply would not really help, I’d suggest replacing those cheap PCI SATA cards for a HBA.
 

k002

Cadet
Joined
Nov 9, 2023
Messages
3
First of all, thank you for all the feedback and problems i can improve on.

I have made a improvement to the power situation. The drives now get power from a ATX PSU with 500W. It really did improve the situation. The drives seem to be much more responsive. But the IO Errors are still there. I have the impression they got less.

I know the PCIe to SATA cards i use are bad and can lead to a lot of problems, but i used two by now and both of them produce about the same number of IO Errors which seems a bit strange.

I`ve let the system run for a few days with the ASM 1166 card an got about 130 ZFS IO Errors (with not fatal errors)(multiple srubs). Changed it today for the Marvel 9215 and got 155 ZFS IO Errors in one day of use (with 1 Scrub and 1 fatal error). I guessing the cards are responsible for a few of these errors but for all of them?

The strange thing is, that my old laptop 2.5 500GB Hitachi Drive is producing no errors at all. I produced some when the power situation was bad, but now not a single one. That’s why I was thinking the Exos drive was just bad, and I put in a the second Ironwolf pro drive as backup.

I don`t understand why one drive works perfectly while the two other drives are making problems. The drives a not all in one Vdev and not even in the same pool if that is of concern.

Thanks for all the help in advance.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
I guessing the cards are responsible for a few of these errors but for all of them?
First, why not? Second, it doesn't matter. Even if we assume that they are responsible for only a single error, that is one too many.

As others noted: Using two power supplies is an extremely(!) bad idea. Unless you have a degree in electrical engineering, this is dangerous for the hardware and possibly even yourself. Seriously: Don't do that!
 
Top