New Multi-Actuator Hard Drives from Seagate

HoneyBadger · Dec 12, 2020

joeschmuck said:
As soon as a home user can buy a set of these drives we will see complaints about how ~~difficult it is to configure TrueNAS to support these new drives.~~ TrueNAS is broken and a bad product because it does exactly what it is programmed to do and doesn't adequately prevent footgun moments despite people having to load, aim, and pull the trigger.

Edited for added cynicism. ;)

As someone who hates the stealth-SMR trend enough to have called it five years earlier, I'll be bookmarking your post to refer to it in another five years or so when the vendors quietly slip "DM-MA" drives into the consumer channels.

Arwen · Dec 12, 2020

One question I have, is that the data sheet states "Interface Ports=Single".

SATA disks only have 1 port/path. But SAS disks have 2 ports/paths on the same connector. This is to allow disk arrays to have 2 controllers for redundancy, (same as Fibre Channel has). So, does this mean it's the normal SAS "single connector with dual ports/paths"? Or not?

If it is the stock SAS, then these might be good disks for disk arrays. Say you have 10 disks, and 2 controllers. Each controller gets 1 half of each disk. After RAIDing, it exposes the result on the SAN to clients. Each controller gets "full" speed of it's half of the disk. Plus, it's own dedicated communication channel to each disk.

As for LUNs, SCSI has supported LUNs from the very early days. Some hardware RAID controllers used to export their LUNs as SCSI LUNs. In the original SCSI standard, LUNs were limited to 8 per target, (3 encoding bits):

Wikipedia - SCSI CDB - Command Discripter Block

This does help the "badblocks" issue, how to test larger disks without wasting a weeks or months. These would test in half the time of a similarly sized disk, (with the same RPM).

HoneyBadger · Dec 12, 2020

Arwen said:
One question I have, is that the data sheet states "Interface Ports=Single".

SATA disks only have 1 port/path. But SAS disks have 2 ports/paths on the same connector. This is to allow disk arrays to have 2 controllers for redundancy, (same as Fibre Channel has). So, does this mean it's the normal SAS "single connector with dual ports/paths"? Or not?

Single path SAS. The secondary SAS port is used internally for communication between the two drive SoCs in this model.

Arwen said:
If it is the stock SAS, then these might be good disks for disk arrays. Say you have 10 disks, and 2 controllers. Each controller gets 1 half of each disk. After RAIDing, it exposes the result on the SAN to clients. Each controller gets "full" speed of it's half of the disk. Plus, it's own dedicated communication channel to each disk.

Unfortunately not for these. But even if they were, you'd have to avoid locating two halves of a mirror on the same unit. The middleware could be written to detect the presence of multiple LUs on a single physical device and work around it, but Seagate would need to provide some sample drives (or simulated ones via firmware) for validation.

~~Edit: Given the per-unit pricing I'm seeing through channel on these (north of USD$600) I don't anticipate any home users jumping in the pool too soon.~~ Scratch that, looks like it was a placeholder item. Seemed awful expensive over the regular Exos X14 - pretty sure the "2X14" model number shouldn't mean "double the price."

Chris Moore · Dec 13, 2020

HoneyBadger said:
Given the per-unit pricing I'm seeing through channel on these (north of USD$600) I don't anticipate any home users jumping in the pool too soon.

Where did you find them? I was looking and couldn't find them for sale anywhere.

HoneyBadger · Dec 13, 2020

Chris Moore said:
Where did you find them? I was looking and couldn't find them for sale anywhere.

Scratch that, looks like it was a placeholder item. Seemed awful expensive over the regular Exos X14 - pretty sure the "2X14" model number shouldn't mean "double the price."

Newfoundland.Republic · Dec 13, 2020

Chris Moore said:
you effectively fit two hard drives in the space that one hard drive used and it only takes a little more power than one regular drive.

I think everyone sees the problem is that the shared drive enclosure (e.g., helium leak), shared motor, etc. That has to be balanced against the ability to get more drives - AND more potential throughput - in the same space as "traditional" drives.

I think that time will tell with this configuration. That said, I can't recall the last time that I had a spindle motor fail but I don't have that many drives

Chris Moore · Dec 13, 2020

Newfoundland.Republic said:
I think that time will tell with this configuration. That said, I can't recall the last time that I had a spindle motor fail but I don't have that many drives

At work, I have some servers with about 300 drives running and the last time I had a total failure was in 2017. That was a WD Red (maybe Pro don't remember) 4TB where it overheated so hot that it also caused the two drives adjacent to it in the server to fail. Craziest drive failure I ever saw. I wasn't standing there watching, but I did hunt through the logs and there were temperature alerts on that drive. As I recall, it got over 140c. Most drive faults I have experienced in the last five years have been bad sectors or some relatively minor fault where replacing the drive was a preventative measure to protect against the potential of a future, something, that I didn't want to worry about.
Similar results in my home network, but on a smaller scale.

For work, I see these drives as a way to get more IO out of the equipment I already have because trying to cram more drives in the limited space I have to work with is a problem. Realistically, I doubt I will be able to get these drives for a year, not at a quantity that I can actually use operationally.
I am still hopeful that I might get a set of sample drives from Seagate.

sretalla · Dec 14, 2020

So in terms of handling the failure domain story, I'm thinking that to get the most out of these drives, one would need to pair two of them in a pair of crossed mirrors, with two different backplane paths also used (one for each) in order not to have this make things worse.

So one disk (presenting da1 & da2 to the OS) attached to backplane 1:

Code:

 MIRROR-A

    da1

 MIRROR-B

    da2

A second disk (presenting da3 and da4 to the OS) attached to backplane 2:

Code:

 MIRROR-A

    da3

 MIRROR-B

    da4

This resulting in an overall pool looking like:

Code:

ZPOOL1
 MIRROR-A
   da1
   da3
 MIRROR-B
   da2
   da4

With this setup, you can have backplane 1 or backplane 2 fail (not both) and still have a pool of two broken mirrors.

You can also have one "drive" fail and have a pool of two broken mirrors.

I guess if you continue like that, you could also come up with a plan to spread disks across backplanes and make a 2-VDEV RAIDZ2 pool with evens in one and odds in the other (likewise you could continue adding mirrors with the same alternating odds and evens pattern).

I'm not sure that this would be easy to keep compliance to, but at least there should be a way to do it.

HoneyBadger · Dec 14, 2020

Basically, Seagate has released hyper-threaded drives. Now we need to code the software to schedule them.

sretalla said:
I'm not sure that this would be easy to keep compliance to, but at least there should be a way to do it.

Similarly to how we have physical/logical cores now, the middleware stacks for storage appliances like TrueNAS will need to be aware of and able to handle these drives correctly. Eg: da0 and da1 are detected as multiple LUs that resolve to the same target/serial/WWN - mark them as being in a shared failure domain and design pools accordingly if possible (in large scale, you could automatically create a set of two vdevs for each given set of drives at half-size). If user tries to implement a pool that will compromise redundancy (co-locating mirror vdevs on same physical device, attempting to create RAIDZn with insufficient physical devices), alert and offer override via checkbox/command-line. OpenZFS itself could certainly write some detection logic in the zpool create command as well, and the middleware would then just need to interpret.

For anyone building a pool or other software/mdraid style setup with MA drives though, they'll need to be aware and design around it themselves or else face the music later when a single physical drive failure takes out a "mirrored" vdev.

I'm far more concerned about the implications of if/when these drives start making their way onto SATA plugs, where the concept of LUs doesn't exist. The controller could simply do an internal stripe of the LBAs across both sets of platters (odd-numbered LBAs on spindle1, even-numbered on spindle2) and hope for some performance benefit there.

Q: Is a single-volume presented drive in future scope (i.e., the drive itself will load balance/optimize between either the first or second half of the drive)?
A: Maybe; it is possible. There are tail-latency issues with a configuration like this, but it is the simplest way to get plug-and-play performance. There needs to be enough market to justify the firmware development/complexity.

I'm reminded of the WD Black2 drive, where it mashed 120GB of MLC NAND and a 1TB HDD together to a single contiguous LBA range. The first 120GB worth of LBAs mapped to the NAND, everything after that was on the HDD. Formatting it as a single large partition would result in "unexpected performance characteristics" certainly, and it needed manual setup. If you did, though, you had the same idea - a small SSD for your operating system and a larger HDD for capacity, in a single 2.5" space. Good idea for laptops but the pricepoint made it unappealing. Even now I can only find the drives for about US$70 or so, and that's just silly when I could buy a 500GB 860 EVO for USD$55.

HoneyBadger · Dec 14, 2020

Reading the techpaper there's definitely some other issues at hand with specific SCSI commands crossing LU boundaries due to being device-wide.

SAS LUN behavior leads to ambiguity where some commands affect the individual LUN and others affect the device (both LUNs). It’s important for users of the drive to note these differences to successfully deploy the drive. High priority commands (HPC) such as Read and Write are LUN-based. Low priority commands (LPC) are a mix of LUN and device-based. Some examples of the more impactful device-based commands are noted in the table below:

COMMAND LUN/DEVICE Details
Test Unit Ready (0×00) Device Command will only report ready if both LUNs are ready
Power Modes Device Idle A, B, C, and Standby modes are all device-based
Format Unit (0×04) Device Format to either LUN initiates the data loss format of both LUNs
Flush Cache Device Cache is shared, so this command will affect both LUNs
Start/Stop Unit (0×1B) Device Start/Stop Unit affects the single motor in the drive
Sanitize Device Sanitize sent to either LUN sanitizes the entire device

You can query the device vs. LUN effect of each command on the drive by issuing the REPORT SUPPORTED OPERATION CODES command and noting the multiple logical units (MLU) field for each command. Seagate is actively working on the inclusion of these nuances into a T10 proposal to standardize the usage of the MLU field on multi-actuator drives. You can access the T10 proposal here: http://www.t10.org/cgi-bin/ ac.pl?t=d&f=18-102r1.pdf

I don't expect low-level formats or sanitize ops to be issued frequently, but "flush cache" is kind of important to ZFS and that could make things work Not So Very Well.

AlexGG · Dec 14, 2020

HoneyBadger said:
"flush cache" is kind of important to ZFS and that could make things work Not So Very Well.

Maybe it will make things Not So Very Fast, but it does not violate the original guarantee. The guarantee states that as the Flush Cache command is issued, the cache is flushed. It does not say anything about the drive flushing its cache at some other time on its own initiative. The MA-drive flushes LUN-specific cache on command, but also at some other times, which by a sheer coincidence match the times the flush command was issued against the other LUN.

Newfoundland.Republic · Dec 14, 2020

HoneyBadger said:
OpenZFS itself could certainly write some detection logic

There is a very good point here - I don't think it would be iXsystems who would be responsible but Oracle?

Chris Moore · Dec 14, 2020

Newfoundland.Republic said:
- I don't think it would be iXsystems who would be responsible but Oracle?

Oracle doesn't participate in OpenZFS.

HoneyBadger · Dec 14, 2020

AlexGG said:
Maybe it will make things Not So Very Fast, but it does not violate the original guarantee. The guarantee states that as the Flush Cache command is issued, the cache is flushed. It does not say anything about the drive flushing its cache at some other time on its own initiative. The MA-drive flushes LUN-specific cache on command, but also at some other times, which by a sheer coincidence match the times the flush command was issued against the other LUN.

It doesn't violate the data assurance guarantee, no, but it could definitely be responsible for all kinds of ugly performance issues. True that if "both halves" of a disk are in the same pool (a 2x2 mirror equivalent) they'll likely be receiving requests to commit their cached data to disk at the same time; but the potential for interference does exist. I'd much prefer the command to be able to be sent at the LU level, but this would likely also require the cache to be logically split in the same manner.

Like I said; there's coding to be done. Although this seems like it'll be a much more reasonable goal rather than "make SMR not hot garbage."

alfredfriedrich · Dec 14, 2020

HoneyBadger said:
Reading the techpaper there's definitely some other issues at hand with specific SCSI commands crossing LU boundaries due to being device-wide.

Cache is shared, so this command will affect both LUNs

I wonder how long it will take, untill we see row hammer like attacks on these discs.

ChrisRJ · Dec 14, 2020

I am inclined to believe this kind of drive came into being because at least one of the big hyperscalers asked for it. It seems to be made for a very specific combination of requirements. And at scale it can make sense to have something like this. For the rest of us, with the current trajectory of price for SSDs (and "adjacent" devices) and how you can combine them with regular spinning drives for performance, I don't see how these special drives will ever be relevant.

robbiek01 · Oct 23, 2022

I bought 8x Exos 2x14 I have 6 of them in my TrueNAS Server in 2 drive mirrors and I can confirm that TrueNAS-13.0-U2 only detects one actuator and reports 6.37GB of drive space.

I purchased them off a fellow TrueNAS user where I live who works for a Hyperscaler, after purchasing 2 HBA's could not get the drives to spin up, I purchased a Supermicro LSI 9300 8i HBA because that what was recommended by Seagate it recognizes the drives but only half the storage.

I'm gutted to say the least the old adage "If it seems too good to be true it usually is" I spent 1220 euros on 8 drives and a HBA, I should have and will do in the future stick to SATA drives, my SAS days are behind me.

I have 4 SATA bays left in my Server I will populate them with SATA and decomission the SAS drives for sale over on Ebay

rvassar · Oct 23, 2022

robbiek01 said:
I bought 8x Exos 2x14 I have 6 of them in my TrueNAS Server in 2 drive mirrors and I can confirm that TrueNAS-13.0-U2 only detects one actuator and reports 6.37GB of drive space.

I purchased them off a fellow TrueNAS user where I live who works for a Hyperscaler, after purchasing 2 HBA's could not get the drives to spin up, I purchased a Supermicro LSI 9300 8i HBA because that what was recommended by Seagate it recognizes the drives but only half the storage.

I'm gutted to say the least the old adage "If it seems too good to be true it usually is" I spent 1220 euros on 8 drives and a HBA, I should have and will do in the future stick to SATA drives, my SAS days are behind me.

I have 4 SATA bays left in my Server I will populate them with SATA and decomission the SAS drives for sale over on Ebay

I wouldn't give up on SAS. The dual actuator drives are the problem here. If you had single actuator drives you'd almost certainly be content.

Consider, SAS has an entire next generation coming out, 24gig SAS 4. There are no plans at present to extend SATA any further, it's a dead end. This makes your SAS config is somewhat more future proof. Additionally, all SAS controllers can communicate with SATA devices. You can plug SATA drives into your SAS controller and as long as you keep the 1 meter cable restriction, they'll work just fine. SATA controllers do not talk to SAS devices of course. SAS also allows longer cable runs, expander's permit hundreds of devices to be connected, and there are also other useful types of SAS devices, like LTO tape drives, etc...

HoneyBadger · Oct 23, 2022

robbiek01 said:
I bought 8x Exos 2x14 I have 6 of them in my TrueNAS Server in 2 drive mirrors and I can confirm that TrueNAS-13.0-U2 only detects one actuator and reports 6.37GB of drive space.

I wouldn't mind seeing a smartctl and dmesg output here. It might be a case of the middleware only presenting the first LU.

Is there a potential to try this drive under a different OS (TN SCALE perhaps, or a live Linux distro) in case it's a matter of a driver/kernel needing to speak to two LUs behind one SAS device?

If not, then there might be enterprising/crazy individuals here who'd be interested in picking one up to experiment with themselves.

Arwen · Oct 23, 2022

Without doing any research, I'd guess that that if these dual actuator drives are SAS, then they need a tweak in the SAS controller. Sometimes a SAS controller is configured to scan only LUN 0 of each target. It generally adds noticeable amount of time to scan all the LUNs. But, the SAS configuration could have an option to scan LUN 0 -> LUN X, with X being configurable. Thus, change X from 0 to 1.

That could expose the second half of the disk.

Just keep in mind HOW you configure the pool. If using Mirrors, obviously don't use both halves in the same Mirror. And in the case of RAID-Z1, either keep the halves in separate vDevs or separate pools. Because loss of a whole disk would take out 2 sub-disks.

COMMAND	LUN/DEVICE	Details
Test Unit Ready (0×00)	Device	Command will only report ready if both LUNs are ready
Power Modes	Device	Idle A, B, C, and Standby modes are all device-based
Format Unit (0×04)	Device	Format to either LUN initiates the data loss format of both LUNs
Flush Cache	Device	Cache is shared, so this command will affect both LUNs
Start/Stop Unit (0×1B)	Device	Start/Stop Unit affects the single motor in the drive
Sanitize	Device	Sanitize sent to either LUN sanitizes the entire device

Important Announcement for the TrueNAS Community.

New Multi-Actuator Hard Drives from Seagate

actually does care

MVP

actually does care

Hall of Famer

actually does care

Guru

Hall of Famer

Powered by Neutrality

actually does care

actually does care

Contributor

Guru

Hall of Famer

actually does care

Cadet

Wizard

Cadet

Guru

actually does care

MVP

Similar threads