BUILD SuperMicro X10SRL-F + 3 846 Chassis + 72 Disks

pclausen · Apr 26, 2015

marbus90 said:
Just drop everything in the current pool and add the other vdevs later on. You'll still get lots of performance out of that, even if the data is not optimally balanced between vdevs.

Ok, so if I just start out with a single vdev of 10 x 2TB drives, that gives me ~16TB of usable space using raidz2. So should I fill it to ~80% (12.8TB) and then add a 2nd vdev to the pool (once I have 10 more drives that have been tested to have no SMART errors)?

Once I add that 2nd vdev and copy more data to the pool, will the system automatically know to add all this new data to the 2nd vdev until both contain approximately the same amount of data?

Newbie zfs question. When I copy a large file to a zpool, say 50Gb, does that file get "spread out" across multiple vdevs? What about within each vdev?

pclausen · Apr 26, 2015

Since the red fail LEDs on my backplane are not supported to show drives with SMART errors or that have dropped, what the easiest way to "ping" a drive to identify it physically. For example, right now SMART is telling me /dev/da19 has uncorrectable sectors, but unfortunately da19 does not map physically to SAS slot 19 in my chassis and the mappings often change with reboots.

marbus90 · Apr 26, 2015

dd if=/dev/da19 of=/dev/null and the corresponding activity LED should be lit up constant.

yes, ZFS balances between vdevs - until each has reached the respective fill level, for example 10x2TB = 16TB = ~13TiB usable, if you fill this to 60% and then add the new 10x1TB = 8TB = ~6.5TiB usable vdev, it will from now on fill the 6.5TiB vdev to 60% first.

if you start with an empty pool and multiple equal-sized vdevs, ideally the data is striped across multiple vdevs. if you add more vdevs to a pool, it does get more complicated - but you'll always get single-vdev speeds at least.

DataKeeper · Apr 26, 2015

Easy.. Don't use the device (/dev/daXX) but the serial to locate the drive. I used a label maker but paper, fine tip pen and clear tape works well as well. I simply printed off the serial of each drive and affixed them to the drive caddy of the matched drive. Device names can mess you up but a serial shouldn't.

pclausen · Apr 26, 2015

Perfect thanks. Great info!

Since I purchase my 846s new originally, they came with little stickers that I attached to each drive caddy as seen here, mapping physical SAS port backplane IDs to the number on the sticker.

So what I'll do is to whip up a spreadsheet matching these 0-23 labels in the disks in each caddy to the corresponding serial numbers. And use the dd utility as a final check just before pulling the physical drive.

marbus, so zfs doesn't have a "balance" feature to re-stripe data as one adds more vdevs? Probably a non issue for home use serving up media, but could make a considerable different when running VMs I would think.

marbus90 · Apr 26, 2015

VMs run off striped mirrors, so with the 72bay example that's 35 vdevs plus 2 spares already. At that count it usually doesn't matter - new writes are going to that fresh vdev, reads still are spread across all 70 data drives (since ZFS can read from both members of a mirror = doubling troughput). Also most VM filers are usually way overspecced to avoid the fragmentation issues altogether.

A workaround might be to create a new dataset and zfs send or mv everything into the new dataset. this should force ZFS to spread more evenly, especially if you move it to a temp dataset and then back to the main dataset. It really touches every block and moves it around on disk.

You could also look for a small labelmaker to stick the serial#-labels next to the bay-ID stickers. Always note down the full serial number, not just the last 3 or 4 digits.

Ericloewe · Apr 26, 2015

Hot spares weren't working properly last time someone mentioned them and I haven't heard anything about the problem being identified, much less fixed.

marbus90 · Apr 26, 2015

Even if, resilvering to another drive in a spare bay is preferable over swapping out the slightly degraded disk.

Ericloewe · Apr 26, 2015

marbus90 said:
Even if, resilvering to another drive in a spare bay is preferable over swapping out the slightly degraded disk.

Of course. Just a heads-up to avoid nasty surprises.

pclausen · Apr 26, 2015

marbus90 said:
A workaround might be to create a new dataset and zfs send or mv everything into the new dataset. this should force ZFS to spread more evenly, especially if you move it to a temp dataset and then back to the main dataset. It really touches every block and moves it around on disk.

I currently have 44TB of data stored individually on 10 x 2TB drives and 7 x 4TB drives.

On my new ReadyNAS box I have the 10 x 2TB drives and 10 x 1TB drives, which are now all running with no SMART errors.

Maybe my best bet would be the following:

1. Create 2 temp zpools, each with a single vdev consisting of 10 like sized drives. Let's call them zpool1 and 2.
2. Copy content of 7 4TB drives into each zpool until they reach 90%, which is about 21.6TB. That leaves 6.4TB that I need to find a home for. I can probably scare up misc internal and external drives around the house to store that on.
3. Take the 7 4TB drives that are now empty, add the 2 4TB drives that were used for SnapRAID parity, and purchase 1 new 4TB Red, and I now have 10 4TB drives to setup another single vdev zpool with. Let's call is zpool 3.
4. Copy the data from the 10 x 2 TB drives into the newly created pool along with the 6.4TB I temporarily stored elsewhere on my network.
5. Create zpool4 using the 10 disks freed up in step 4

After the above steps are carried out, I will have 4 zpools as follows with no striped data at all:

zpool1 10 x 2TB 90% full
zpool2 10x 1TB 90% full
zpool3 10x4TB 83% full
zpool4 10x2TB 0% full

So if I understand what you're saying correctly, I would need to add additional vdevs to zpool4 to hold ALL the data on zpools 1-3 before starting the mv process?

But if I do that, and then add the vdevs from zpool 1-3 into zpool4, then those would not be contain any striped data of the content already in zpool4.

I guess my ultimate goal is to have my 44TB of data evenly striped across a single zpoll with 7 vdevs in it, each with 10 disks. My configuration would be as follows:

vdev1 10 x 2TB
vdev2 10x 1TB
vdev3 10x 4TB
vdev4 10 x 2TB
vdev5 10 x 1TB (In 3rd chassis not yet in my possession)
vdev6 10 x 1TB (in 3rd chassis not yet in my possession)
vdev7 10 x ?TB (I haven't purchased these yet. They are to replace all my 1TB drives that failed with SMART errors. My guess is I'll go with 4TB Reds for this)

Maybe I should just not worry about striping the data across vdevs and just stand up a single production zpool now and add the 10 x 2TB and 10 x 1TB drives to it, and keep going as I free up existing drives with data on them?

marbus90 · Apr 26, 2015

not pools, but datasets. datasets can be explained as windows partitions. if you move between partitions, you need to move the file on disk and in the file allocation table, same as with datasets.

yes, don't worry about performance loss. just slap everything into a giant zpool and add new vdevs as old raidsets are freed up. If you were to encounter performance issues (unlikely), you can try the dataset approach to rebalance between vdevs.

the biggest issue with different sized vdevs is following: if you have 10x4TB and 10x1TB in a zpool and in theory each vdev can do 400MB/s troughput, ZFS would write with full 400MB/s to the bigger vdev and with only 100MB/s to the second vdev because it balances the vdev fill grade. if you were to replace the 10x1TB with another 10x4TB, both vdevs can be written to at 400MB/s each - meaning 800MB/s total vs. 500MB/s.

At least that's the grey theory. :)

pclausen · Apr 26, 2015

Ah ok. I wasn't sure if by datasets you were using another term for pool. :)

I'll just do one large zpool then. I might wait until the end of the week in case I do get the 3rd chassis full of 1TB drives by then. This way I can at least have 4 10 member zdevs in the pool before I get started. Besides, with 40TB of available storage, I can almost move everything over in a single operation.

Maybe I'll spring for a pair of those Intel E15729 10Gb NICs before I start copying. They are now down to $58 buck w/ free shipping on eBay. :) I'll just need to make sure the 15729 is compatible with my X7 motherboard.

So with 3 chassis and 3 1015 controllers, does it matter how the disks for each vdev is distributed among them? I was just going to start with chassis 1 and put the 20 disks for vdevs 1 and 2 in it. Vdev 3 would then consume the last 4 bays in chassis one and the first 6 in chassis 2, and so on, until I'm down to just the last 2 bays for spares.

marbus90 · Apr 26, 2015

With 10disk raidz2... not so much. if you would do 6disk raidz2, I'd spread every vdev across all 3 chassis -> 2 in ch0, 2 in ch1 and 2 in ch2 - one chassis can fail and you still have access to your data. With 10disk raidz2 it would need 5 chassis to spread that risk :)

(goes daydreaming about 11x 45disk chassis with dual expanders in one rack, 11disk raidz3 across all chassis, redundant SAS switches to a TrueNAS or other dualhead servers...)

pclausen · Apr 26, 2015

Cool thanks. Now I have an excuse for adding 2 more chassis. :) LOL. Wife would kill me. 3 is going to be bad enough.

Somewhat related question. If I power down the server, can I move disks around between chassis and then I boot back up, FreeNAS will "find" them without any issues.

Also, just realized I have this Alert:

Is this a result of switching to 9.3 "nightlies" and does it mean I can now safely flash my 1015 to P20?

marbus90 · Apr 26, 2015

2015-04-26 18:40:21 <jpaetzel> marbus90: the nightly has v20 driver!

so, if you want to stick to the nightly train, flashing to P20 is probably advisable. and at some point the nightlies changes will go into the stable train as well...

DataKeeper · Apr 26, 2015

and they call this stable...

pclausen · Apr 26, 2015

So I switched to the nightly train as it resolved the issue of SMART not sending emails. But, it looks like it broke Windows CIFS sharing. I can no longer write any content to my zpool. I tried deleting the share and adding it back in. I tried using the wizard too, and while it looks like it creates the share, its not there when I go look under shares. Doing it manually does appear to work, but when I try to copy something to the zpool, I get write permission denied.

Guess I'll roll back to STABLE to see if that fixes it. I'd rather be able to write to the zpool vs. getting SMART emails. :)

pclausen · Apr 26, 2015

Rolled back to STABLE 9.3 and Windows CIFS sharing is working again, but the plugin I installed under the nightly is gone. Perhaps this is normal behavior?

Figured I might as well try the latest 10 nightly. Maybe *both* SMART email and CIFS sharing works in it. :D

Darren Myers · Apr 26, 2015

My original install of 9.3 broke CIFS, after updating 4 times i got EVERYTHING working that i needed. Now that it is in production, i spoke with @cyberjock and he advised to update every 2-3 weeks this way any potential bugs get squashed from the first update or 2 updates.

i plan to update once a month or so.

Ericloewe · Apr 26, 2015

pclausen said:
Rolled back to STABLE 9.3 and Windows CIFS sharing is working again, but the plugin I installed under the nightly is gone. Perhaps this is normal behavior?

Figured I might as well try the latest 10 nightly. Maybe *both* SMART email and CIFS sharing works in it. :D

No, FreeNAS 10 is still unusable.

Important Announcement for the TrueNAS Community.

BUILD SuperMicro X10SRL-F + 3 846 Chassis + 72 Disks

Patron

Patron

Guru

Patron

Patron

Guru

Server Wrangler

Guru

Server Wrangler

Patron

Guru

Patron

Guru

Patron

Guru

Patron

Patron

Patron

Guru

Server Wrangler

Similar threads