NFS share to slow for security cameras

07stuntar1 · Oct 20, 2020

We are using a Super Micro for our FreeNas server.
FreeNAS 11.3 U4
Intel Xeon Gold 5222 3.8GHz
768GB DDR4
2 x 10Gbps NIC
(record) 15 vdevs in a pool (14tb x 4 wide) z1 WD UltaStar 7200
(backup) 11 vdevs in a pool (14TB x 11wide) z2 WD UltaStar 7200

Video recording software is running off Ubuntu 20. FreeNAS is mount via NFS.
Ubuntu 20
16 Cores
16GB Ram
2x 10gb Nic

We are using NFS to mount the share on the Ubuntu Server. The server records to the record drives and then at midnight backups up to the backup drives.
For some reason our security camera software is complaining there isn't enough speed for recording. And we drop some fames or have a gap in recording.
We have about 200 camera which is about 1gbps of bandwidth. I cant figure out why. When I DD test i can fully saturate a 10gb nic.

Another problem we are having is when the files are being backed up from the record drives to the backup drives it will not pass 1gbps and can take multiple days. There are a lot of small files about 50mbps per file. I tried testing with rsync (I believe the software is just using copy) and the speed still stays about 1gbps. Anyway to increase this speed?

Rand · Oct 21, 2020

Are you mounting the share sync or async or have you set the pool to async?
Are all 200 cameras writing at the same time? Are they writing while the backup is running?

Your problem might be the concurrent write activity - try running a fio test with appropriate write blocksize (as per mount or datastore value) and concurrent processes to determine the max capability of the record pool.

You might benefit from moving to sync and adding an optane slog to convert the many concurrent writes into fewer continuous writes (note this is not something usually recommended, but in your special case it might help).
Or possibly add a special vdev to the pool for rapidly changeing data (eg metadata) (have not tried that yet since its new in 12)

For speeding up record to backup you need to stop hammering the record pool with writes or it wont have time to do reads...

07stuntar1 · Nov 2, 2020

Rand said:
Are you mounting the share sync or async or have you set the pool to async?
Are all 200 cameras writing at the same time? Are they writing while the backup is running?

Your problem might be the concurrent write activity - try running a fio test with appropriate write blocksize (as per mount or datastore value) and concurrent processes to determine the max capability of the record pool.

You might benefit from moving to sync and adding an optane slog to convert the many concurrent writes into fewer continuous writes (note this is not something usually recommended, but in your special case it might help).
Or possibly add a special vdev to the pool for rapidly changeing data (eg metadata) (have not tried that yet since its new in 12)

For speeding up record to backup you need to stop hammering the record pool with writes or it wont have time to do reads...

Hi @Rand Sorry for the late reply. I was testing your method except for the special vdev for rapidly changing data.
Yes the cameras are writing 24/7 to the "scratch" pool constantly. Then at midnight scratch pool backs up to the long term storage pool. Its two different NFS mounts. Should the drive the cameras are being written to use sync and the back up drives which the files are copied to later use async. or both sync?
We found when adding an SSD its slow then the vdev pools causing the bandwidth to be less and take longer to backup. Any other suggestions?

Fredda · Nov 3, 2020

Just to get it right, the scratch and backup pool are both on the same server? Why use NFS for that and not copy from one pool to the other on the server?

07stuntar1 · Nov 3, 2020

Fredda said:
Just to get it right, the scratch and backup pool are both on the same server? Why use NFS for that and not copy from one pool to the other on the server?

Yes same server, the recording software we use likes to have a scratch disk and a backup to “cold storage” for longer term storage. I’ve actually tried having another supermicro server just for backup. Didn’t make a difference.
Just to test I tried to rsync data off on the freenas didn’t make much of a difference. Being small files the server never picks up speed.
I DD speed test and able to get 20gbps even when cameras are writing.

Fredda · Nov 3, 2020

Do you backup the complete data or only a part if it. If you backup the complete dataset you should look into ZFS replication.

Rand · Nov 3, 2020

07stuntar1 said:
different NFS mounts. Should the drive the cameras are being written to use sync and the back up drives which the files are copied to later use async. or both sync?

You'd need a fast enough ssd (optane) for the slog, and only the scratch pool needs it, the goal would be to bundle 200 individual small write requests into fewer larger so take pressure of the disk drives so they have time to read backup data. O/c you need to mount with syxnc option or force sync always on the pool (and the mentioned fast enough slog, not just any ssd)

Replication might help if you let it run multiple times/day in order to reduce the amount of data that needs to be read from the busy pool (which i still think is the issue).

How much data are we talking about here (from the 200 cameras)? Maybe an all SSD pool for these if its only up to a few TB scratch space.

Dan Tudora · Nov 3, 2020

hello
maybe other approach can help in speed transfer from one pool to another
if is the same server WHY not use ubuntu on baremettal
IF surveillance software work just in ubuntu use ubuntu in native way (with ZFS of course)
in this way eliminate network/bhvye/network "again" layer finally NFS layer and complication
Keep It Simple Stupid is a very good approach
do NOT stay in left foot and scratch with right hand in the left ear
maybe is another solution
OR wait for the TrueNAS Scale, is debian linux and can do more things with surveillance software if is "blocked" to linux
and read other post about encoding/decoding HW acceleration (Shinobi surveillance software)
https://www.truenas.com/community/threads/gpu-transcoding-in-plex-container-vm.88331/#post-612750
success

Dan Tudora · Nov 3, 2020

Rand said:
You'd need a fast enough ssd (optane) for the slog

and YES. for the OP read sugeggestion for speed up POOL write/read
is about pool /ZFS speed not frankenstain/network/bhvye/network/NFS layers synk-asynk combination
is just pure ZFS
rethink solution for real world
DO not think toooooo complex
complex adding latency
KISS
cherrs

Rand · Nov 3, 2020

Not entirely sure what you're getting at.

The main problem the OP seems to have is that he is dumping 200 streams on a bunch of spindles.
That seems to work fine until he starts a parallel read activity for backup reasons at which point the bandwidth (or more likely likely the drives) are not able to keep up with the 200 cam's.

There are various options to solve this, but they all depend on the OP's constraints (cash, space, automation capabilities (to change write location) and so on)

07stuntar1 · Nov 3, 2020

Fredda said:
Do you backup the complete data or only a part if it. If you backup the complete dataset you should look into ZFS replication.

The recording software backs up every midnight. There only a once a day backup option. It copies over everything automatically and erases it when needed or if set 30 days.

07stuntar1 · Nov 3, 2020

Rand said:
You'd need a fast enough ssd (optane) for the slog, and only the scratch pool needs it, the goal would be to bundle 200 individual small write requests into fewer larger so take pressure of the disk drives so they have time to read backup data. O/c you need to mount with syxnc option or force sync always on the pool (and the mentioned fast enough slog, not just any ssd)

Replication might help if you let it run multiple times/day in order to reduce the amount of data that needs to be read from the busy pool (which i still think is the issue).

How much data are we talking about here (from the 200 cameras)? Maybe an all SSD pool for these if its only up to a few TB scratch space.

I was thinking the 768gb of ram would have been faster then the optane drives. Would it have been better to do more ram? I actually have 2 optane drives I can double check the model I believe they were 200gb. When I had added one I saw a decrease in speed I though the (record) 15 vdevs in a pool (14tb x 4 wide) z1 WD UltaStar 7200. So that’s 450mb/s about * 15. Which makes it faster then the optane. My thinking could be wrong. Is it possible to combine the 2 optanes and use it as slog for faster speeds?

It’s about 1gbps so about 10.8tb per day. But right now it’s not able to back up the full 10.8 tb per day so it over laps with days before.

07stuntar1 · Nov 18, 2020

Rand said:
You'd need a fast enough ssd (optane) for the slog, and only the scratch pool needs it, the goal would be to bundle 200 individual small write requests into fewer larger so take pressure of the disk drives so they have time to read backup data. O/c you need to mount with syxnc option or force sync always on the pool (and the mentioned fast enough slog, not just any ssd)

Replication might help if you let it run multiple times/day in order to reduce the amount of data that needs to be read from the busy pool (which i still think is the issue).

How much data are we talking about here (from the 200 cameras)? Maybe an all SSD pool for these if its only up to a few TB scratch space.

Hi @Rand,
Sorry for the late response. I have 2 SSDPEL1k200GA 200gb m.2 200GB.
I added both to the LOG Vdev hopefully that was correct. Will it be 400gb of log or are they just mirrored?
Also I used these mount options. I didnt know how to figure out the block size to use. I just did a nfsdstat -m seems like default is 131072? so isn't that better?

nfs4 sync, soft,rsize=8192,wsize=8192 0 0

Is this correct.
Thank you

Rand · Dec 4, 2020

How the two drives are used depends on how you added them to your pool (as mirror [recommended] or not). You won't need that much (s)log space in the first place as the amount of data going into that is limited in the first place (tuneable to a certain degree).

Now you seem to run with 8K size, the default seems to be 128K. Given that you have large packages for the most part larger bs is better.

So did the sync'ed slog help or is it still as bad as it used to be?

07stuntar1 · Dec 4, 2020

Rand said:
How the two drives are used depends on how you added them to your pool (as mirror [recommended] or not). You won't need that much (s)log space in the first place as the amount of data going into that is limited in the first place (tuneable to a certain degree).

Now you seem to run with 8K size, the default seems to be 128K. Given that you have large packages for the most part larger bs is better.

So did the sync'ed slog help or is it still as bad as it used to be?

I added the 2 drives at the same time. Im assuming thats mirrored. Looking at the graphs it seems to be using it.
We still have the same amount not enough speed for recording errors. I have moved to 128k as well.

I still cant pass 1.2gbps when copying the many small files from the scratch disk to the archive via rsync or recording software.
I am able to DD write/read a file over 4gbps.

Rand · Dec 4, 2020

You can see the pool layout using the zpool command.
If there is no improvement with the optane drives you can remove them again. Be careful only to remove the slog, not any of the pool drives.

Rand said:
Replication might help if you let it run multiple times/day in order to reduce the amount of data that needs to be read from the busy pool (which i still think is the issue).
How much data are we talking about here (from the 200 cameras)? Maybe an all SSD pool for these if its only up to a few TB scratch space.

So the easy solution did not help which means you need to change the scratch pool. Depending on current pool layout a (marginal) improvement might be to switch to a mirror'ed setup instead of RaidZ (but you would need to re-create the pool for that, so downtime).

If I calculated correctly you have about 2TB/day worth of data incoming, so it should be not too big a deal to add a couple (2 or 4 maybe) of SSD or NVME drives (depending on server connectivity) and set up as scratch pool. That should take care of most of the issues caused by the spinners

07stuntar1 · Dec 5, 2020

Rand said:
You can see the pool layout using the zpool command.
If there is no improvement with the optane drives you can remove them again. Be careful only to remove the slog, not any of the pool drives.

So the easy solution did not help which means you need to change the scratch pool. Depending on current pool layout a (marginal) improvement might be to switch to a mirror'ed setup instead of RaidZ (but you would need to re-create the pool for that, so downtime).

If I calculated correctly you have about 2TB/day worth of data incoming, so it should be not too big a deal to add a couple (2 or 4 maybe) of SSD or NVME drives (depending on server connectivity) and set up as scratch pool. That should take care of most of the issues caused by the spinners

I was thinking with 768GB of RAM that should be a lot. But it does seem to use the slog a bit. Also tried adding a few more vdevs to the scratch to see if the speed would increase but doesn’t seem to make a difference.

Wouldn’t 1gbps be about 10.8tb a day?
Another road block was the mtu size as of right now I have it set to 1500 since the host is 1500. I tried to set both to jumbo but I didn’t see an increase so I stepped it back down. I can do more testing if you think that would make a difference. Is there a max throughput for 1500 mtu
also I wanted to make sure the storage drive which it’s being backed up too should that be in sync as well. Seems like the playback is a bit slow so I swapped it back since there wasn’t an improvement.
Would you recommend nfs or smb to a Linux server.
Thank you for all you help

Rand · Dec 5, 2020

The problem is that the ram is not getting used as much as you'd hope for the kind of data you have; its ever changeing so caching only helps a bit.

You're right, if it is a total of 1 Gbps for all 200 cameras then its 10TB/d, sorry. still manageable with SSDs I'd say, especially given the amount of cash you've sunk into the scratch device;)

Just reread the relevant post.
You have 15 Raid Z1 (4 drives) vdevs; that should give you 15 times the IOps of a single disk
(https://www.ixsystems.com/blog/zfs-pool-performance-2/)

Lets look at this from another point of view
Each camera writes 625 KB/s, which means with a 128K block size we look at 4 IOPS per camera per second (+ metadata), so a total of 800 IOPS + say some for the Metadata, so about 1k IOPS (actually more depending on zfs recordsize, the fill rate of your pool [fragmentation], ZFS copy on write behaviour and other factors).

According to https://community.spiceworks.com/topic/354040-iops-from-7-2k-sata , a 7200RPM drive can do 75 IOps, so you need at least 11+ disks worth of IOPS to handle the incoming writes. O/C write cache (and slog) help streamlining this but I guess your pool is close to max performance with the write activity. You can observe the IOps for the pool by running zpool iostat on it.

Now at midnight you start adding load to this by trying to read 10 TB worth of data to an already well used pool. With a 128k (assumed) recordsize the backup needs to perform 86400000 IOps to read that data.
If the pool was idle, that wouldnt be much of an issue since you'd be able to do streaming reads resulting in high speed, but with the write load on it, this basically gets turned into random read IOPS since the disk head has to move back and forth all the time to cope all the different write , rewrite and read requests.

Do you see the issue?

Maybe it might also help if you shed some light how the backup process is working - is that triggered by you or the Ubuntu VM running the video capture software? If its by you then you could move to ZFS replication or run the backup more often to distribute the load. If its by the software you probably cant.

Re SMB vs NFS
Its basically irrelevant for your issue, but linux usually can do both well enough. If SMB is easier to access the data later then its ok to switch

Re MTU - it should not make a difference here.

Re Backup @ sync - the sync was only to try to convert random write IOPS (by 200 individual devices) into streaming writes to easy the write IOps needed by the scratch pool.

07stuntar1 · Dec 8, 2020

This is my current iostat. So is the operations read/write is total iops i have. How can I tell how much I am using?

07stuntar1 · Dec 8, 2020

Rand said:
The problem is that the ram is not getting used as much as you'd hope for the kind of data you have; its ever changeing so caching only helps a bit.

You're right, if it is a total of 1 Gbps for all 200 cameras then its 10TB/d, sorry. still manageable with SSDs I'd say, especially given the amount of cash you've sunk into the scratch device;)

Just reread the relevant post.
You have 15 Raid Z1 (4 drives) vdevs; that should give you 15 times the IOps of a single disk
(https://www.ixsystems.com/blog/zfs-pool-performance-2/)

Lets look at this from another point of view
Each camera writes 625 KB/s, which means with a 128K block size we look at 4 IOPS per camera per second (+ metadata), so a total of 800 IOPS + say some for the Metadata, so about 1k IOPS (actually more depending on zfs recordsize, the fill rate of your pool [fragmentation], ZFS copy on write behaviour and other factors).

According to https://community.spiceworks.com/topic/354040-iops-from-7-2k-sata , a 7200RPM drive can do 75 IOps, so you need at least 11+ disks worth of IOPS to handle the incoming writes. O/C write cache (and slog) help streamlining this but I guess your pool is close to max performance with the write activity. You can observe the IOps for the pool by running zpool iostat on it.

Now at midnight you start adding load to this by trying to read 10 TB worth of data to an already well used pool. With a 128k (assumed) recordsize the backup needs to perform 86400000 IOps to read that data.
If the pool was idle, that wouldnt be much of an issue since you'd be able to do streaming reads resulting in high speed, but with the write load on it, this basically gets turned into random read IOPS since the disk head has to move back and forth all the time to cope all the different write , rewrite and read requests.

Do you see the issue?

Maybe it might also help if you shed some light how the backup process is working - is that triggered by you or the Ubuntu VM running the video capture software? If its by you then you could move to ZFS replication or run the backup more often to distribute the load. If its by the software you probably cant.

Re SMB vs NFS
Its basically irrelevant for your issue, but linux usually can do both well enough. If SMB is easier to access the data later then its ok to switch

Re MTU - it should not make a difference here.

Re Backup @ sync - the sync was only to try to convert random write IOPS (by 200 individual devices) into streaming writes to easy the write IOps needed by the scratch pool.

The back up is done by the camera software and only allows back up once a day. So we would need to budget for 10TB+ of ssd storages depending how fast the backup takes.

Thank you for your detailed explanation it really helps.

Important Announcement for the TrueNAS Community.

NFS share to slow for security cameras

Dabbler

Guru

Dabbler

Guru

Dabbler

Guru

Guru

Patron

Patron

Guru

Dabbler

Dabbler

Dabbler

Guru

Dabbler

Guru

Dabbler

Guru

Dabbler

Dabbler

Similar threads