Coming back to the basic point, I suppose the thread could be reframed as
@jgreco says:
What are the power options if one wants to be very sure of robust power delivery, and the NAS will include a modest but sizeable HDD array (say 12 - 28 HDD scale)?
Clearly it depends on the use-case so "no one answer fits all". I'm asking about my specific use-case so perhaps I should restart, and try to refocus it a bit.
This post is a bit long - if it's too detailed then please ignore the excess and skip to the bottom which poses the immediate question.
1) Background:
HDD current draw:
Manufacturer specs are clear that HDDs can at times draw extreme peak loads in some cases, on
both 12v (motors/spinup) and 5v (logic/DSP circuits), especially during spinup but
moreso in actual use (writing). Nobody other than the manufacturer knows the data patterns where these peaks happen most, so we can't be sure what to do to reproduce them or if they are very common in some scenarios, but they happen. 45drives has also published oscilloscope
captures for a pool of 45 in-use HDDs during RW sessions. Their defining characteristics are:
- The peaks are transient current draws that can be 2-3x more than even the HDD's normal "maximum in-use" operational current draws. An ordinary 6TB enterprise drive can demand 3.08A/37W @ 12v and 1.45A/7.25W @ 5v at peak (3 sigma).
- It cannot be mitigated by staggered spinup, although that certainly helps and is the best known and most predictable situation, because it can happen during operational use as well (spinup from idle), it affects 5v (not just the 12v motors), and it can happen "in use" not just at spinup (as transient current peaks during heavy write sessions).
- They cannot be detected by IPMI or the hardware itself, nor at the wall, nor really by anything except decent probes/equipment on the HDD side of the PSU and designed for measuring indirect current transients within the HDD PSU cables.
- They pose an especial problem for the often-ignored 5v line, because larger ATX PSUs are tuned for high 12v output: a disk array suddenly needing a spec of 40-50A on 5v (1.5A x 24 HDDs x 120% for headroom x 150% for PSU degradation over time, less 25% for any non-sync of the transients?) may have much less design consideration.
- In a large array there may be some averaging effect, and there might also be reserve power to cope briefly in the PSU capacitors, but there's no way to be sure how firmly to count on it.
Power provision for HDD arrays:
There seem to be roughly 4 ways that power supply can be done for a drive array:
single PSU,
multiple PSUs paralleled together in some way,
specialist single voltage high current PSUs (for example instead of a typical computer PSU, having a specialist 12v-only PSU and a specialist 5v-only PSU that can provide high currents and good regulation), and eventually,
some kind of federated back-end where storage is grouped with PSUs provided per group.
Parallelled PSU can probably be sub-categorised in a few useful ways for this purpose -
redundant or just
supplementary to each other;
ad-hoc paralleled or
designed that way; and
power lines commoned or
separated (all rails cross-wired so all PSUs attached to all devices, or each PSU powers distinct devices with only 0v in common so no devices are powered by 2 PSUs). I'm ignoring multiple rails here as a complicating factor and assuming single rail for simplicity.
If that's not exactly right, it's probably close enough for this discussion.
2) The NAS:
NAS usage patterns:
The NAS build is in the OP and my signature. I'm the main user, my main requirement is that I can pretty much take for granted that my data is safe, and get to a point where managing its safekeeping and storage is mostly "set and forget". If a drive goes, I don't have to even jump in particularly quickly. Just
zfs detach
,
zfs attach
- and carry on whatever I'm doing, data was never at risk, and I pretty much won't notice. "It just works". That sort of thing,
for the next 30 years.
The wall power here is very good and I'm on 3 way mirrors with a good PSU and APC Smart-UPS behind it, so I'm not thinking much about a freak accident wiping out the whole NAS and every drive at one stroke, right now. I'll set up automated replication in a while.
I'm a data and multitask hound. The NAS contains all my ESXi VMs (which can be 500 - 900GB each), all my datasets, all my backups, everything, back to the 1990s. When I didn't have rsync/zfs snaps, I just made a recopy to the old servers each time, and I'm still in that habit, so I have a lot of repeat data. The VMs often have a lot of repeat data too. It also stores family data and it's way easier to teach them to frequently backup everything than to use rsync (or set rsync up for them when I'm not in that habit myself!), so I have about 300 copies and part-copies of my mum's 40GB photo archive for example. With all that, dedup gives 3.9x saving, so the NAS is specced for dedup from the start (128GB 2400 ECC, NVMe L2ARC, fast Xeon v4, etc).
I also take rollback seriously, so right now it's set to 15 minute snaps for a week, daily snaps for 4 months, and bimonthly snaps forever. The pool size is about 12 TB which includes 2 x 3TB zvols for iSCSI, and about 55% usage of the 22TB capacity (so that ZFS can work at its best speed), and is based on 3 way mirrors (same reason + speed + more flexible than RAIDZ).
My use pattern is moderate much of the time, very heavy at times, but with multihour- or days-long idle/low level use (think days not using them, or nights when sleeping). I mainly access data via Samba 3.x, iSCSI, and CLI. The NAS is on a 10G group with my VM server and workstation, on a 10G switch + Chelsio, for fast data between any of them. I've had it doing Samba at 1GB/sec in the past, lost it somehow, and aim to again. I don;t run SQL or anything requiring that kind of access pattern on it. But when busy I might be moving/copying/renaming a few TB of directories (of any size from tiny to huge) around, snapshotting ESXi, and using files via Samba, and resilvering/scrubbing,
and I want them all fast :D The relevance is that at times, the disks will be busy and data-intense multitasked, even if there's just one major user.
Disks are set to park after 60 mins idle, although I'm unsure if it's better for their longevity to keep them spinning the whole time. The energy waste from never parking them would be considerable and that's ended up the decider. But it does mean there will be mass spinups regularly, which are not related to system boot. I don't know if those get staggered with FreeNAS + Supermicro BIOS, but suspect not.
NAS enclosure:
I'm at ease with modding/home building, where needed, and I like good quality kit where it might pay off. I looked for ages for a drive store capable of holding 20 HDDs and they're all designed for server room environments, where you cram them tight and use high power fans in a case. The cheaper ones have no real antivibration isolation either and just rely on the HDD build for that. I prefer the other approach - stack them well spaced and vertically (like fins), with good fans below and antivibration isolation on all drives, in a rack that naturally channels smooth upward airflow past them all, and let natural convection take care of dispersal. I get trivial HDD access with no screws or trays, no scope for resonance or vibration, near silence, and even in full use the drives are cool to the touch and rarely outside 27 - 35C. The cost was about £15 ($20) + fans to build, and it just sits next to the NAS. The system enclosure itself can then be an ordinary modest desktop case, as it only has to contain the baseboard, PSU, and PCIe cards, and doesn't need to make real concession to HDD cooling or airflow.
3) Where I'm at:
My critieria / priorities:
I agree even a single 1200W PSU should
"generally" cope in practice. I know it, most people know it. 1200W would "probably" be fine all around for this size array and even 1600W would usually be seen as overkill. But equally, as the specs say and
@jgreco points out, there will be situations where it
isn't adequate, and which we cannot anticipate nor measure crudely with a wall wattmeter or IPMI, but we know they'll happen and they can cause occasional drive brownouts if they briefly lead demand to outstrip the PSU's ability to supply from the mains or via its reserve capacitors.
A user is free to decide if they want to meet the "real world probably OK" standard or the "really will be OK" standard. For my NAS, I started this thread because I'm firmly in the second camp - I want a NAS that I have no doubts about, and if that means overkill on the PSU to avoid brownout risk, so be it.
While accepting that is probably real-world overkill 99.9% of the time, I ask it be accepted for this thread as a starting point. If it wasn't, I'd have a 1200W PSU already and be done with it, instead of asking about other options.
My position can be summarised as:
Temporary loss of pool access (or scrub afterwards) with no lasting damage, or even a minor rollback of a few TXG's, is unimportant. A risk of actual sizeable data loss is critical.
Initial thoughts:
- Redundancy: I don't mind the PSU eventually failing - nothing is mission critical - so long as the pool itself survives. So redundancy as such is a complete non-concern for me. Bluntly, I wouldn't pay a penny more, to get a redundant PSU (compared to a single PSU) if both were otherwise adequate.
- Power quality: I am interested in high quality power provision. A good or excellent build, and almost certainly single rail (for ease of current allocation), to handle the demands and peaks properly and ensure good clean supply to all the drives. EVGA/SuperFlower is my brand of choice on this, among consumer PSUs.
- Efficiency: Because of lengthy idling, I'll probably go for Platinum/Titanium 80+, so that when the HDD array and CPU are idle and power draw drops to 100-200W, it's got a chance of not wasting much power at the wall. Titanium gets me 88-90% on a good PSU, I can accept a loss of 10-20W as pretty amazing, given the large PSU it comes from.
Implications:
- Separate back-ends: I don't have the setup justifying a separate back-end. The HDDs will be directly attached.
- Specialist bench PSUs: I also don't have certainty that a high power specialist 5v/12v bench supply would be tuned for this use, comply with ATX specs (such as reserve power for 16ms loss, or whatever else ATX defines/expects) able to manage the typical and peak draws, and tested for many years of PC/NAS use, and I'm frankly a lot more comfortable with PSUs actually designed for a computer. While this might be one optimal solution, my apprehensions rule it out.
- Server enclosure PSUs: I don't have any experience with server enclosures at all, and my impression is that multiple PSU server enclosures aren't a great match for my needs, which are adequate peak power and not redundancy. (If peak power wasn't the concern then a 750W-1200W PSU would be great). I also couldn't afford them new but only 2nd hand - I'm happy with some gear 2nd hand but not a PSU. While someone with professional experience might be very happy, and I accept this would be the "usual" commercial answer, I'm much less comfortable with this option in my use-case, which takes account of my own experience. Part of that is admittedly my lack of familiarity/experience with their operating characteristics, which is not okay given its crucial role. They will also be much more noisy, so I will have to move my NAS away from my work area, which will be inconvenient, and much less capable of extreme (or much real) energy efficiency when 10% idle, which is where the NAS spends 70-80% of its time (100-200W vs. 1600-2000W peak).
I accept these might seem weak reasons to some, who may see this as the "real" solution and the rest as compromises one has to accept. I'm open to reasoned explanation why I don't need to worry, but the cost and practical impact would still be virtually prohibitive issues in my use-case.
- That narrows it down to big PSU or parallel (dual) PSU, and this is roughly the point I reached when I decided to ask for help to get to a final choice. I'm comfortable with both of these, barring a couple of points which are well defined and not complex to discuss.
4) Modes of failure:
@jgreco raised a critical point I hadn't considered. I think that is worth considering first, because it isn't just a PSU issue.
Q: What are the implications for the pool if (for any reason) enough HDDs go inaccessible temporarily to kill redundancy. When they come back up, how sure is it that the pool can recognise them, and restore itself to working condition, or is there a significant chance the pool will have died?
This isn't a question that just affects PSUs. Half my HDDs are on an HBA. If the entire HBA dies, I can swap in a spare. But it's the exact same scenario. Half the HDDs suddenly vanish. The data on them is intact, but they suddenly cannot be acessed. When I replace the HBA or whatever it was, with another identical spare, and the disks become visible again, will the ZFS pool necessarily be recoverable or rollbackable to a recent valid txg?
Someone else already
asked this, and the replies they got were that there's a difference between disk
failure and disk
unavailability, and not to worry about HBA loss - it wouldn't lead to pool loss. If that's correct then loss of one of 2 parallel PSUs, rendering some of the pool HDDs suddenly unavailable (but undamaged) until the PSU was replaced, would be identical.
Is that correct? I think that has to be the starting point.