Spindles Still Matter
Over the past number of years, hard drive capacity has skyrocketed. I remember not too long ago when a 512GB drive was huge. However, now it’s commonplace to see 2, 4, or even 6TB sized HDDs in everything from desktops to storage server systems. Thanks to consumers, MLC SSDs have also come a long way in terms of capacity and price but the larger enterprise-class SSDs are still very costly. When building a storage system, an all-SSD configuration could be out of reach for many companies.
As good as the larger capacity HDDs are for the industry as a whole, it has caused some people to fall victim to a usable space vs. performance trap by purchasing too few, large hard disks.
Let’s look at some examples with two different drive sizes: the HGST Ultrastar 7K6000 2TB Enterprise SAS drive and the HGST Ultrastar 7K6000 6TB drive. When looking at the specifications from HGST, regardless of whether it’s a 2TB drive or a 6TB drive, they have nearly identical performance characteristics but vastly different space densities. Now let’s take this to the next step. I want to build a 15TB NAS. If I am not careful, I could fall victim to the “I don’t need that many drives” trap. With 6TB drives, even using RAID 5, I would get 15TB from four 6TB drives.
So I now have the space I need, but I have an array that only has three data disks. Do I have the performance my application requires? The performance on these three data disks is going to be capped by throughput—the HGST specification sheet said each disk would have around 225MB/sec throughput, which gives me less than 700 MB/sec.
Does this meet my space requirements? Yes, it does. But what happens when I place these drives into a production environment and find that they run pretty slow when streaming data from them via a 10G interface? Now what do I do? This is where spindles matter. In this example, 700MB is only about 50% of a 10Gb Ethernet interface. With this array, I couldn’t even saturate a single 10G link.
So what are my options when creating my array? I need more performance but I only need 15TB. This is where I say this: configure to your performance needs. The usable will be there because of the size of disks that are available. Now let’s use the 2TB versions but still keep our 15TB usable space requirement. Using 2TB drives I can get 16TB from 9 drives, but my throughput is over 1.6GB/sec. That’s almost 2.3x the previous array.
I could easily saturate a 10G link with this array. Now I do understand that an argument can be made for additional slots needed, rack space, and power. This is why many vendors incorporate fast SSD cache devices in their units. These SSDs sit in between the disks and the client allowing for you to have far fewer disks with the ability to cache large amounts of the data thereby increasing performance. While this is a very good method it is still bound by traffic patterns, protocols, hot data sets, etc.
So the answers comes down to fall-back performance, or “worst case” performance. In the worst case possible, no caching, what level of performance will you get out of your disk array? This is a question that can only be answered by your particular use case and comfort level of using cache.
When building your array, determine the lowest performance number you will accept and build your array to meet this. If you are using a storage appliance that has caching ability, then as they say, BONUS!!!! The usable space will be there.