Would I benefit from a dedicated SSD ZIL?

Status
Not open for further replies.

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
Hey all,

The guide suggests checking with zilstat.

I loaded up a worst case write load of 6 simultaneous HD recordings (MythTV writes to my FreeNAS box) as well as some other miscellaneous writes that might be going on at the same time.

Problem is, I don't fully understand the output, and when a dedicated ZIL would be appropriate, and really don't know if these writes are synchronous or not (one would think writing large mpeg files would be sequential, but you never know)

I'm trying to solve an issue where there is skipping when I record and playback the same show at the same time (like time shifting with a DVR)

Here are my results:

Code:
~# zilstat -t 60
TIME                    N-Bytes  N-Bytes/s N-Max-Rate    B-Bytes  B-Bytes/s B-Max-Rate    ops  <=4kB 4-32kB >=32kB
2014 Aug  1 13:18:25  623499720   10391662   15626184  665583616   11093060   16515072   5078      0      0   5078
2014 Aug  1 13:19:25  645754864   10762581   17978008  689569792   11492829   19005440   5261      0      0   5261
2014 Aug  1 13:20:25  642361064   10706017   20112744  684982272   11416371   21233664   5226      0      0   5226
2014 Aug  1 13:21:25  686078704   11434645   26390584  730857472   12180957   27656192   5576      0      0   5576
2014 Aug  1 13:22:25  679570784   11326179   22014712  722337792   12038963   23068672   5511      0      0   5511
2014 Aug  1 13:23:25  675923976   11265399   19917552  721420288   12023671   20971520   5504      0      0   5504
2014 Aug  1 13:24:25  670082856   11168047   19491296  714997760   11916629   20578304   5455      0      0   5455
2014 Aug  1 13:25:25  649737784   10828963   18799480  693370880   11556181   19660800   5290      0      0   5290
2014 Aug  1 13:26:25  665050672   11084177   23619464  710279168   11837986   24772608   5419      0      0   5419
2014 Aug  1 13:27:25  624072240   10401204   15810544  668467200   11141120   16777216   5100      0      0   5100
2014 Aug  1 13:28:25  669554104   11159235   19915792  766435328   12773922   21102592   5964      0    108   5856
2014 Aug  1 13:29:25  640427920   10673798   15437024  698683392   11644723   16515072   5347      0     15   5332
2014 Aug  1 13:30:25  955904680   15931744   46549680 1011351552   16855859   48627712   7716      0      0   7716
2014 Aug  1 13:31:25  857059392   14284323   37336736  919687168   15328119   39059456   7075      0     62   7013
2014 Aug  1 13:32:25  876726880   14612114   38173368  931528704   15525478   40239104   7107      0      0   7107
2014 Aug  1 13:33:25  711064344   11851072   29610680  759697408   12661623   31195136   5797      1      0   5796
2014 Aug  1 13:34:25  594192792    9903213   19124552  651833344   10863889   20054016   5111      0    149   4962
2014 Aug  1 13:35:25  661954072   11032567   21622656  708837376   11813956   22806528   5408      0      0   5408
2014 Aug  1 13:36:25  643512888   10725214   18260032  689045504   11484091   19398656   5257      0      0   5257
2014 Aug  1 13:37:25  644810520   10746842   18329992  689442816   11490713   19136512   5261      1      0   5260
2014 Aug  1 13:38:25  672630288   11210504   19905648  716701696   11945028   20971520   5468      0      0   5468


Do you guys think this type of ZIL load would benefit from a dedicated SSD mirror?

If so, what are you people using these days? People used to recommend Intel's small SLC SSD's, but I'm not sure if that is the case anymore. Based on write endurance Samsungs new 850 Pro's look like they might be up to the task, but the smallest ones are 128GB, which might be overkill?

I'd appreciate any input.

Thanks,
Matt
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Are you using NFS in a capacity in which sync writes are being performed in high occurrence? If not then a ZIL won't help you.
 

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
Are you using NFS in a capacity in which sync writes are being performed in high occurrence? If not then a ZIL won't help you.

Well, that's just it. How do I determine if my writes are sync or sequential?

I figured running this data would be a good way to find out.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
What I would do, is for testing purposes only, set sync=disabled on the dataset in question. Then see if you have the same issues. If you do, then I would image a dedicated zil device won't help, as sync=disabled is about as fast as it's going to get.

If the problems go away with sync=disabled, then I would think a zil (slog more specifically) would help.

The way I think about it (and I may be wrong), is that an infinitely fast slog with zero latency will give you the performance of sync=disabled. Or another way to look at it, adding a slog shouldn't ever get you faster than sync=disabled, but it'll maintain posix compliance with sync writes, if an applicable ssd is chosen.
 

DaPlumber

Patron
Joined
May 21, 2014
Messages
246
Bear in mind that a ZIL SLOG helps with latency with concurrent access, not bandwidth.

How many disks do you have in the pool grouped into how many vdevs with what format? Bear in mind zpool performance is driven by the number of vdevs, not the number of disks. As a rough rule of thumb a single vdev will perform pretty much like a single disk. So 6 disks will perform better as 3 mirrored pairs rather than as a single RAID-Z2. However that has SPOF implications, so it's a trade-off. Video record and playback streams are bandwidth intensive, but because of buffering tend not to be latency sensitive. Video EDITING is sensitive to both. How much RAM does this system have? Without a SLOG the ZIL lives in RAM, so for only 6 streams bumping up the RAM may do more good than a SLOG in giving you a bigger buffer to mitigate your worst case.

6 Video recording streams are effectively going to all go through the ZIL (RAM), so it could be as simple as adding up the bandwidth for all 6 (or just using the numbers from zilstat) and comparing to the backend bandwidth of the available vdev(s) for buffer writes. Video streams tend to be constant an not "bursty" so once you run out of buffer you'll see stalls.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
Bear in mind zpool performance is driven by the number of vdevs, not the number of disks. As a rough rule of thumb a single vdev will perform pretty much like a single disk. So 6 disks will perform better as 3 mirrored pairs rather than as a single RAID-Z2.

What you say is true, for random access.

A pool with only one vdev will perform random IO at appx the speed of the slowest drive in that vdev. However, assuming one is not extremely cpu limited, a 6 disk z2 vdev will perform sequential IO at least as good, if not better than a mirrored stripe. (for sequential access, a 6 disk z2 should have appx the throughput of 4 disks. 6 disks in a mirrored stripe setup would have appx the throughput of 3 disks, whereas the z2 will have the random iops of 1 disk, and the 3 vdev mirrored stripe will have the random iops of appx 3 disks, possibly 6 for read).

For example, my 11 disk single vdev z3 pool can read / write (sequentially) at 800 MB/sec or more. Clearly this is far far more than any single disk. However for random read, yes, I'll have appx the iops of a single disk. Random writes are generally queued into a txg for flushing to disk (sequentially) later, unless of course they're flagged as sync, in which case my sync=disabled test I think would help. However, if the "random writes converted into a sequential txg flush" are then read back sequentially, it'll end up being a bunch of random reads to the disks. That's where multiple vdevs, and lots of ARC comes in handy.

Media streaming should end up being generally sequential I would imagine. I still suggest TESTING with sync=disabled for the dataset in question. If this helps, then look at a dedicated ZIL device. If it doesn't help, then I doubt a zil will do much.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
As a comparison, here's my zilstat while copying data to it over CIFS at 100 MB/sec:

Code:
root@nas ~ # zilstat -t 60
TIME  N-Bytes  N-Bytes/s N-Max-Rate  B-Bytes  B-Bytes/s B-Max-Rate  ops  <=4kB 4-32kB >=32kB
2014 Aug  7 20:27:40  199016  3316  160688  524288  8738  262144  4  0  0  4
2014 Aug  7 20:28:40  248232  4137  142640  393216  6553  262144  3  0  0  3
2014 Aug  7 20:29:40  190632  3177  145712  393216  6553  262144  3  0  0  3
2014 Aug  7 20:30:40  225576  3759  195824  393216  6553  262144  3  0  0  3


Almost nothing. Over a 60 second period, I've got about 200kb of sync writes happening. Obviously a zil isn't going to help here.
 

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
Thanks for your thoughts guys.

I currently have a single RAIDz2 vdev in the pool with 8 disks.

I am in the process of migrating everything to a new system.

I am considering getting 4 more disks and instead setting up a pool with two 6 disk RAIDz2 vdevs.

If I decide to go with the SLOG for the ZIL, would I need one slog per pool, or one per vdev?

Thanks,
Matt
 

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
1 per pool.

Thank you!

Do you guys have any thoughts regarding the best way to do the migration?

Here is my plan:

Current setup has 8 drives, 4x4TB and 4x3TB. in RADIz2

I plan on ordering 4 more 4TB drives. I also have a 2TB drive kicking around I can use during the transition.

1.) Create new pool with 1 6 drive VDEV on new server. Use 4 new 4tb drives + 1 old 2 TB drive, plus pull one of the two redundant 3TB drives from the existing pool, for a total of 6.

2.) Copy over all data to new server.

3.) remove all drives from old server, and use to create second 6 drive vdev. One 3TB drive will be left over.

4.) Swap out 2TB drive in the first vdev on the new server with the left over 3TB drive.

5.) Add mirrored ZIL with my two SSD's

This will leave me with two vdevs in the pool, each containing 4 4TB drives and two 3TB drives. The 3TB drives will be swapped out, and the volumes grown with 4TB drives.

My only concern with this approach is that all the migrated data will be on the first of the two vdevs, while the second of the two will be empty. For best performace the data should be distributred between them, correct?

Is there any way I can do this? I'm thinking simply copying all the exsiting data to a different folder on the same pool, then removing the old folder. This way all the data will be rewritten across both vdevs. Or will this just bias the data two the second VDEV, as it tries to balance the data?

Appreciate your thoughts.

--Matt
 

mka

Contributor
Joined
Sep 26, 2013
Messages
107
How should I read the output of zilstat? I let it run yesterday while two people accessed the server through CIFS. This is not the absolute worst case but the usual workload. But I was surprised the sync writes did have their pitch at only 5.75MB
Code:
# zilstat -t 60
TIME                    N-Bytes  N-Bytes/s N-Max-Rate    B-Bytes  B-Bytes/s B-Max-Rate    ops  <=4kB 4-32kB >=32kB
2014 Aug 26 20:46:49     491928       8198     258608     655360      10922     262144      5      0      0      5
2014 Aug 26 20:47:49     278568       4642     182000     393216       6553     262144      3      0      0      3
2014 Aug 26 20:48:49    5753528      95892    5584984    6291456     104857    5898240     48      0      0     48
2014 Aug 26 20:49:49    1599080      26651    1595032    1724416      28740    1703936     18      5      0     13
2014 Aug 26 20:50:49      69720       1162      30008     405504       6758     184320     10      1      1      8
2014 Aug 26 20:51:49    1240816      20680     995240    1740800      29013    1179648     14      0      0     14
2014 Aug 26 20:52:49      64584       1076      31576     536576       8942     131072      5      0      1      4
2014 Aug 26 20:53:49    1783968      29732    1147280    2359296      39321    1310720     18      0      0     18
2014 Aug 26 20:54:49     178848       2980     107240     524288       8738     131072      4      0      0      4
2014 Aug 26 20:55:49      51576        859      25720     536576       8942     131072      5      0      1      4
2014 Aug 26 20:56:49     381456       6357     204080     921600      15360     262144      8      1      0      7
2014 Aug 26 20:57:49     252456       4207     178544     655360      10922     262144      5      0      0      5
2014 Aug 26 20:58:49      94880       1581      42872    1048576      17476     393216      8      0      0      8
2014 Aug 26 20:59:49     162352       2705     162352     262144       4369     262144      2      0      0      2
2014 Aug 26 21:00:49     380264       6337     380264     393216       6553     393216      3      0      0      3
2014 Aug 26 21:01:49          0          0          0          0          0          0      0      0      0      0
2014 Aug 26 21:02:49     217688       3628     175216    1048576      17476     393216      8      0      0      8
2014 Aug 26 21:03:49      17328        288      12856     262144       4369     131072      2      0      0      2
2014 Aug 26 21:04:49       8944        149       4472     262144       4369     131072      2      0      0      2
2014 Aug 26 21:05:49     132512       2208     128040     262144       4369     131072      2      0      0      2
2014 Aug 26 21:06:49       6472        107       4472     524288       8738     262144      4      0      0      4
2014 Aug 26 21:07:49    3485256      58087    1590168    5111808      85196    1703936     39      0      0     39
2014 Aug 26 21:08:49     154944       2582      68408    1179648      19660     655360      9      0      0      9
2014 Aug 26 21:09:49    1768440      29474    1523104    2228224      37137    1572864     17      0      0     17
2014 Aug 26 21:10:49      48400        806      21800     524288       8738     131072      4      0      0      4
2014 Aug 26 21:11:49    2544032      42400    1751760    4984832      83080    1835008     39      1      0     38
2014 Aug 26 21:12:49    3797064      63284    1619992    4993024      83217    1703936     39      0      1     38
2014 Aug 26 21:13:49    2820232      47003    1624280    5181440      86357    1835008     44      2      2     40
2014 Aug 26 21:14:49    1921376      32022    1802440    2916352      48605    1966080     28      0      0     28
2014 Aug 26 21:15:49    7393816     123230    3298384   25985024     433083    3801088    204      0      0    204
2014 Aug 26 21:16:49    3490992      58183    1754704    5668864      94481    1835008     49      0      0     49
2014 Aug 26 21:17:49    2025680      33761    1696016    3670016      61166    1835008     28      0      0     28
2014 Aug 26 21:18:49    2237952      37299    1733648    3932160      65536    1835008     30      0      0     30
2014 Aug 26 21:19:49    1916840      31947    1628376    3014656      50244    1703936     23      0      0     23
2014 Aug 26 21:20:49    2126336      35438    1540632    2748416      45806    1703936     26      0      0     26
2014 Aug 26 21:21:49     283968       4732     237424    1441792      24029     655360     11      0      0     11
2014 Aug 26 21:22:49    4573744      76229    1590936    6176768     102946    1703936     50      0      0     50
2014 Aug 26 21:23:49     444912       7415     220464    1572864      26214     393216     12      0      0     12
2014 Aug 26 21:24:49     528656       8810     216560    2097152      34952     524288     16      0      0     16
2014 Aug 26 21:25:49    1741008      29016    1578264    3014656      50244    1703936     23      0      0     23
2014 Aug 26 21:26:49      88400       1473      34296     397312       6621     131072      4      1      0      3
2014 Aug 26 21:27:49     145624       2427      55544     528384       8806     131072      5      1      0      4
2014 Aug 26 21:28:49     469432       7823     324424     655360      10922     393216      5      0      0      5
2014 Aug 26 21:29:49    1797616      29960    1489568    3014656      50244    1572864     23      0      0     23
2014 Aug 26 21:30:49    3846608      64110    1667016    8204288     136738    2359296     81     19      0     62
2014 Aug 26 21:31:49    1492720      24878     583896    5017600      83626     667648    178    129      2     47
2014 Aug 26 21:32:49    2038920      33982     742928    8626176     143769    1323008    197    133      1     63
2014 Aug 26 21:33:49    1953568      32559     766544    7475200     124586     798720    201    146      2     53
2014 Aug 26 21:34:49    3217080      53618    2080824    9031680     150528    2236416    201    133      2     66
2014 Aug 26 21:35:49    3487304      58121    2070456    9814016     163566    2228224    175    100      2     73
2014 Aug 26 21:36:49    1713336      28555    1021000    3801088      63351    2097152     29      0      0     29
2014 Aug 26 21:37:49     879864      14664     793096    1310720      21845     917504     10      0      0     10
2014 Aug 26 21:38:49     318520       5308     187312     655360      10922     262144      5      0      0      5
2014 Aug 26 21:39:49     552872       9214     440736    1703936      28398     524288     13      0      0     13
2014 Aug 26 21:40:49    1722992      28716    1590744    2097152      34952    1703936     16      0      0     16
2014 Aug 26 21:41:49     134304       2238      55432    1441792      24029    1048576     11      0      0     11
2014 Aug 26 21:42:49    3419856      56997    1607704    5246976      87449    1703936     41      1      0     40
2014 Aug 26 21:43:49    3436480      57274    1616472    5242880      87381    1703936     40      0      0     40
2014 Aug 26 21:44:49    4502648      75044    1612376    7471104     124518    1703936     57      0      0     57
2014 Aug 26 21:45:49    4675312      77921    1599512    6553600     109226    1703936     50      0      0     50
2014 Aug 26 21:46:49    3390216      56503    1624664    5373952      89565    1703936     41      0      0     41
2014 Aug 26 21:47:49    3337696      55628    1620760    4841472      80691    1703936     42      0      4     38
2014 Aug 26 21:48:49    3340240      55670    1643816    5484544      91409    2097152     44      0      0     44
2014 Aug 26 21:49:49    3456872      57614    1641816    5185536      86425    1703936     41      0      0     41
2014 Aug 26 21:50:49    3520688      58678    1650384    5505024      91750    1835008     42      0      0     42
2014 Aug 26 21:51:49    3331152      55519    1654680    5242880      87381    1703936     40      0      0     40
2014 Aug 26 21:52:49    1815176      30252    1650584    3538944      58982    1703936     27      0      0     27
2014 Aug 26 21:53:49    3623568      60392    1663448    5242880      87381    1703936     40      0      0     40
2014 Aug 26 21:54:49    3641680      60694    1782816    6029312     100488    2228224     46      0      0     46
2014 Aug 26 21:55:49    1937552      32292    1684496    3932160      65536    1835008     30      0      0     30
2014 Aug 26 21:56:49    1926440      32107    1684688    3014656      50244    1835008     23      0      0     23
2014 Aug 26 21:57:49    3745960      62432    1705936    6160384     102673    1835008     47      0      0     47
2014 Aug 26 21:58:49    3462656      57710    1709840    5337088      88951    1835008     46      1      0     45
2014 Aug 26 21:59:49    4006600      66776    1766248    5804032      96733    1966080     45      0      0     45
2014 Aug 26 22:00:49    4271168      71186    1748048    6684672     111411    1835008     51      0      0     51

 
Last edited:
Status
Not open for further replies.
Top