mkarwin
Dabbler
- Joined
- Jun 10, 2021
- Messages
- 40
Due to my system, despite BETA2 version, being in constant use for several private longer running projects, I could not upgrade earlier. Still, now once one of those has completed, I could dedicate a weekend for BETA2 to RC2, and afterwards to RELEASE, update. Apart from ending in the "another update to RELEASE available" issue that has already been reported here on multiple occassions, after the upgrades I've been experiencing performance issues.
It appears that once system gets rebooted, it's taking more than a few hours for pool.import_on_boot job hanging at 80%. OK, it's completed later, either due to some timeout set on the job (I can only guess there is some timer on some job executions) or due to it completing action after 26h. It took several more hours for some services to become available (SMB service could not be started/was not running for nearly 33h, reporting pages are yet to become available/show charts/data). Furthermore, the GUI/WebUI has some issues, also visible on shell actions on server (in ssh session), possibly stemming from:
I'm not seeing any option to limit max number of snapshots per (child)datasets in the edit dataset pane for ix-applications, though I believe there was a property available for max snapshots count in ZFS dataset modification. Perhaps exposing such an option in GUI might have helped a bit/lot to allow dataset granularity level of said property application (possibly an automated solution would also need to be implemented to apply same value to child datasets as I believe it's not exactly automatically inheritable on ZFS level).
I'm also not seeing any option to not have TrueNAS create empty snapshots (outside of the Data Protection page, so in order to deal with the shown ix-applications snapshots) or to have those removed/squashed on an interval basis. I guess since these might be referenced elsewhere or exposed as some other filesystem, it'd be better to either stop the creation of empty snapshot in the first place, or automatic clean-up of all preceding empty snapshots.
This might improve the performance and possibly stability of the platform.
So in other words, more thorough housekeeping needs to be implemented for docker images and ix-applications filesystems and snapshots if this is to be a release version.
Other areas that are possibly hit by same root cause, at least I think so, are these:
as an alert, but there's no GUI clear/easy option linked to manage the situation eg. to prune non-needed/empty/old snapshots in order to fix this issue. Nor do we get an exposed option in Applications -> Settings to enforce such snapshot limits or non-empty requirements or cleanups.
So here's my question/request - do you happen to know a safe way to deal with the ix-applications snapshots problem? It might solve the perceived performance and improve pool import on boot aspects. I'd rather do it safely using solution provided by TrueNAS/TrueCharts experts, GUI/WebUI solution from the main TrueNAS WebUI interface would be best, but if required I can try any shell solution to be run once SSHed into the server.
It appears that once system gets rebooted, it's taking more than a few hours for pool.import_on_boot job hanging at 80%. OK, it's completed later, either due to some timeout set on the job (I can only guess there is some timer on some job executions) or due to it completing action after 26h. It took several more hours for some services to become available (SMB service could not be started/was not running for nearly 33h, reporting pages are yet to become available/show charts/data). Furthermore, the GUI/WebUI has some issues, also visible on shell actions on server (in ssh session), possibly stemming from:
- amount of docker images kept by default/design - the lack of automated prunning I've already raised in here seems to hurt ongoing docker images dealing actions performance
- amount of ix-applications sub-datasets/directories that are being created by TrueNAS - possibly these are created for and by docker/kubernetes for all those images and related configs, but maybe also created by the middleware in its regular operations, regardless there are a lot of those, but more importantly these impact the #3 below,
- amount of snapshots taken on filesystems/directories/datasets from #2 - in my case I've got 2 daily "auto*" snapshot creations defined for non-empty snapshots with a 14 days worth of snapshotting history on my datasets, that's roughly 28 snapshots I have scheduled myself, then by TrueNAS design boot pool has snapshots taken on updates/upgrades so let's say approximately <10 further snapshots, so a managable amount, but when checking the number of snapshots I'm seeing some huge value of 47k, nearly all in ix-applications dataset - I've run to obtain the output attached as zfs-snapshots.txt here (it's taken over 40 mins to get this ouput).Code:
zfs list -t snapshot -H -r -o name,creation,used -s name -s creation
I'm not seeing any option to limit max number of snapshots per (child)datasets in the edit dataset pane for ix-applications, though I believe there was a property available for max snapshots count in ZFS dataset modification. Perhaps exposing such an option in GUI might have helped a bit/lot to allow dataset granularity level of said property application (possibly an automated solution would also need to be implemented to apply same value to child datasets as I believe it's not exactly automatically inheritable on ZFS level).
I'm also not seeing any option to not have TrueNAS create empty snapshots (outside of the Data Protection page, so in order to deal with the shown ix-applications snapshots) or to have those removed/squashed on an interval basis. I guess since these might be referenced elsewhere or exposed as some other filesystem, it'd be better to either stop the creation of empty snapshot in the first place, or automatic clean-up of all preceding empty snapshots.
This might improve the performance and possibly stability of the platform.
So in other words, more thorough housekeeping needs to be implemented for docker images and ix-applications filesystems and snapshots if this is to be a release version.
Other areas that are possibly hit by same root cause, at least I think so, are these:
- in Dashboard -> Network pane reports "Error getting chart data", I guess it's due to that import_on_boot job hanging or timeouting - perhaps there's some lock or other impact on storage actions and due to that OS cannot get chart data - this might be related to point #5,
- in Storage the whole page hangs/loads for hours, in some cases nothing is reported, I guess it's due to the job/snapshots as well, even though I can check the status when ssh'ed to the server and see the pools through long before the import job "ends",Code:
zpool list
- in Sharing I have my NFS shares defined and the service reports as Running in the "UNIX (NFS) Shares" pane, yet when checking from my client machines I cannot see any shares reported by for hours upon reboot, and SMB shares are lost for even longer periods of time as I could start the SMB service only after 32 hours after the reboot - earlier it just errorred to start,Code:
showmount -e <server>
- in Apps -> Installed Applications page I cannot get anything - the page simply "loads and loads"... till the UI session gets disconnected and I have to relogin and repeat the wait for the first couple of hours after the reboot - I think this stems from the snapshots count and all those images, ix-applications datasets and their snapshots,
- all the Reporting subpages show nothing - neither the CPU usage/loads, nor Disk temps and I/O, nor Memory usage/swap, nor Network traffics, nor NFS stats, nor System info, nor ZFS ARC info... charts are simply empty and it's been nearly 48h since upgrade/reboot, so perhaps the storage of sar data that these charts use is still not available even after sucha long period of time, maybe due to all those snapshots that might need to be processed.
Code:
Dataset moria/ix-applications/docker/0cd2c2ed70f2e54d6a2f4211084e4a011e3ece44467a936089de2c98b8cceefa has more snapshots (1357) than recommended (512). Performance or functionality might degrade. 2022-03-19 18:19:51 (Europe/Warsaw)
as an alert, but there's no GUI clear/easy option linked to manage the situation eg. to prune non-needed/empty/old snapshots in order to fix this issue. Nor do we get an exposed option in Applications -> Settings to enforce such snapshot limits or non-empty requirements or cleanups.
So here's my question/request - do you happen to know a safe way to deal with the ix-applications snapshots problem? It might solve the perceived performance and improve pool import on boot aspects. I'd rather do it safely using solution provided by TrueNAS/TrueCharts experts, GUI/WebUI solution from the main TrueNAS WebUI interface would be best, but if required I can try any shell solution to be run once SSHed into the server.
Attachments
Last edited: