multi_report.sh version for Core and Scale 3.0

joeschmuck · Aug 29, 2023

audinator2 said:
@joeschmuck This testing version is spot on! worked flawlessly the first time. Still fighting the dump because I think it's got too many attachments or too long to pass through the email service. Even with duplicates now removed, it's still not processing. I was able to get it going through gmail instead. I have forwarded it on, You should have it in that mailbox soon.

Ouch! 366 files is a lot. I did get the email, thanks. I don't think you ran the latest version but that would have cut it down to a little over 100 files. What I think I am seeing is that I might consider Zipping the files if over a certain size. I took a note for the next version and to figure out how I want to implement it. I don't think most people would want the files all zipped, but maybe set a threshold (user defined) of how many files are attached (how many drives) and zip if it exceeds XX number of files. Maybe a default value of 50 and hopefully I will get feedback on if that is too large or too small a value.

Also, if you have any suggestions to improve the data presentation, please let me know. having so many drives poses a different challenge. At least all the Warning messages are just below the charts.

FrankWard · Aug 30, 2023

joeschmuck said:
Yes and No. You have two ways you can do this, but the best way for your situation is to setup a custom configuration for the one drive and set the Test Age to disable. Also you can restore the ability to display the drive data again.

Custom Configuration (use -config) A -> S and read the instructions carefully. You should be good after that. Let me know if it works or not.

I created a custom config for the SSD drive. The report still shows Orange warning indicator on the drive and the test age despite the test age now showing "---". I assume I did it correctly. Here's the custom drive text.

Custom_Drives_List="S19HNEAD381577E:d:d:d:d:d:d:d:d:d:d:d:0:1:d:d"

joeschmuck · Aug 31, 2023

That looks about right, but the "0" value should be a "2" by default.

Let me ask you this: Does the text section "WARNING LOG FILE" still say "Drive: S19HNEAD381577E -- Test Age = 9xx Days" ?

If it's just an Orange indication, I can live with it for now but I agree it should not be an indication if it's overridden. It should be fixed in the next release. I didn't just make a mental note, I wrote it down too in my list of things to fix/change.

Please let me know if it's just the Orange indication. The warning should not be there.

FrankWard · Aug 31, 2023

joeschmuck said:
That looks about right, but the "0" value should be a "2" by default.

Let me ask you this: Does the text section "WARNING LOG FILE" still say "Drive: S19HNEAD381577E -- Test Age = 9xx Days" ?

If it's just an Orange indication, I can live with it for now but I agree it should not be an indication if it's overridden. It should be fixed in the next release. I didn't just make a mental note, I wrote it down too in my list of things to fix/change.

Please let me know if it's just the Orange indication. The warning should not be there.

Thanks Joe. Here's the output.
WARNING LOG FILE
Drive: S19HNEAD381577E - Wear Level = 1%
Drive: S19HNEAD381577E - Test Age = 1 Days

joeschmuck · Aug 31, 2023

FrankWard said:
Thanks Joe. Here's the output.
WARNING LOG FILE
Drive: S19HNEAD381577E - Wear Level = 1%
Drive: S19HNEAD381577E - Test Age = 1 Days

Would you mind sending me a dump of your multi_report data? Run the script with -dump email because that doesn't look correct, as you have said. And the wear level could be wrong as well. I just want to make sure you are running the most current version because I have made some changes to Wear Level in the last release, hopefully for the better, "v2.4.4_2023_08_19".

FrankWard · Sep 1, 2023

joeschmuck said:
Would you mind sending me a dump of your multi_report data? Run the script with -dump email because that doesn't look correct, as you have said. And the wear level could be wrong as well. I just want to make sure you are running the most current version because I have made some changes to Wear Level in the last release, hopefully for the better, "v2.4.4_2023_08_19".

I sent the dump to you Joe. It's in the middle of a scrub, so there's an additional warning about drive temp on one of my mirror drives that's usually not there. Thanks for taking the time to check it out.

joeschmuck · Sep 1, 2023

FrankWard said:
I sent the dump to you Joe.

Got it and I will look into in about 2 hours. I need to go out and take care of a few "Honey-Do" items.

joeschmuck · Sep 1, 2023

About time TrueNAS forums came back online.

Here is my quick analysis:
SSD S19HNEAD381577E WOW! Over 10 years of runtime, impressive, and only cycled power 120 times. Wear Level is actually 1% remaining. You cannot remove the alarm by hiding the chart data. You can remove this alarm by setting the 'WearLevelCrit=0'. Also the Last Test Age issue being Orange is what I mentioned earlier, the custom value of 0 was incorrect, it should have been left at 'd' for default. And you have the UDMA_CRC_Errors which you have handled. I understand this is your boot-pool drive and I would recommend you replace it. SSD's are cheap.

Both HDD's ZL2P322R and ZL2BBVL0 have a maximum recorded operating temperature of 50C, but in general it is operating between 38-40/41C with a spike to 45/46C during this datapoint. Everything looks great for these drives.

I will PM you a copy of the updated multi_report_config.txt file, pop that in and all should work find again.

Changes I made to your config file were:

HDDtempWarn=47 (Raised 2C for the scrub heat.)
WearLevelCrit=0 (Lowered to eliminate 1% being in alarm, but when it hits 0%, even if the SSD is working, the alarm will occur.)
SSD_Wear_Level="true" (Not hiding the data in the chart now.)
Custom_Drives_List="S19HNEAD381577E:d:d:d:d:d:d:d:d:d:d:d:d:1:d:d" (changed Last Test Age to 1, a '0' value would mean anything greater than zero days is an alarm condition.)

Sending you a PM now.

FrankWard · Sep 1, 2023

Thanks Joe. That SSD was in my Windows home server for the last 9 years. I migrated it over to the TrueNAS boot last year. Do they ever die? ;)

joeschmuck · Sep 2, 2023

FrankWard said:
Do they ever die? ;)

You will be the one to find out. My first SSD had a firmware issue where once it reached so many power on hours, it refused to write. A firmware update fixed it thankfully but I did experience what it feels like to have one that wouldn't write again.

joeschmuck · Sep 10, 2023

Just to let everyone know who is running TrueNAS-SCALE-22.12.3.3, the function 'blkid' is not working correctly (at least on my installation) so the text portion of the report where it cross references the UUID number to a drive ID (example 'd0f8a4fe-bf79-11ed-a0df-000c296fd555 -> ada0') it is missing the drive ID text after the '->'. The next version TrueNAS-23.10-BETA.1 does work again, but it's Beta and while I tested it out, I reverted back to the Stable version in spite of the minor issue with the 'blkid' function.

GrimmReaperNL · Sep 24, 2023

Hi guys, Hope you can help with this.
I have this script setup to run as a cron task, but most of the time it's supposed to run, it doesn't give me the report.
In the GUI it has the following error:

[EFAULT] CronTask "./mnt/TrueNAS/scripts/multi_report.sh" exited with 1 (non-zero) exit status.

I also have 'hide standard output/error' disabled and the email gives me:

Script is already Running... Exiting

This happens when the cron triggers and most of the time i hit the manual trigger in the GUI.

The cron task has the following command: ./mnt/TrueNAS/scripts/multi_report.sh.
Is set to run as root with a custom 10 10 * * sun @ 10:10am schedule.

How can I make sure the scripts runs every time?

joeschmuck · Sep 24, 2023

GrimmReaperNL said:
The cron task has the following command: ./mnt/TrueNAS/scripts/multi_report.sh.

Please remove the period at the beginning of the command (I did test it both with and without and it works in Scale but not Core which must be without the period). See what happens now. Also, exactly what version of TrueNAS are you running? This helps me attempt to recreate your problem. And since I think you are using Scale, I ran up my Scale version 23.10-RC.1 Cobia. I was unable to recreate your issue. Maybe you are not running as root? Attached is a screenshot of my CRON setup (note that I typically recommend Hiding the Standard Output but when I troubleshoot a problem, this comes in handy). If you are still unable to resolve your issue, if there is any information you would rather not share with the group, send me a private message and we can figure out the problem. I was under the impression that you have been using Multi Report for a while now. I'm curious what changed.

One of the things the script does is ensure there are not multiple instances running because there is a few multi_report files that are written to, such as the statistical data base, which do not like being written to by two programs at the same time. So I check to see if more than one are running and if so, exit.

NugentS · Sep 24, 2023

I suggest running the script manually - does it work properly?

GrimmReaperNL · Sep 25, 2023

NugentS said:
I suggest running the script manually - does it work properly?

I removed the period and tried manual. Got the error.

I just changed the cron to every 10 min and it's been working fine the last 4 reports.
I'll change it back to once a week and see what it does.

Thank for the help @joeschmuck @NugentS

joeschmuck · Sep 25, 2023

GrimmReaperNL said:
I'll change it back to once a week and see what it does.

While I know it's obvious, please make sure that you are running the job what TrueNAS is running it's daily jobs and you are not running any other Cron Jobs. Mine is scheduled for 2:00 AM every morning and I have never had a problem, not even when a Scrub or SMART test is running.

GrimmReaperNL said:
I removed the period and tried manual. Got the error.

This is not a good outcome. If you can create the issue manually then there is something wrong.

From the Shell, if you enter ./mnt/TrueNAS/scripts/multi_report.sh (Note you use the period from the command line to tell the OS this is to be executed). If you still get the issue while running from the command line then I would ask you to run ./mnt/TrueNAS/scripts/multi_report.sh -dump email to generate a dump of the data you have and have it emailed to my account. I can't say if there will be a magic bullet in the data as this is not your typical problem.

Davvo · Sep 25, 2023

I run my cronjob with the ./ and don't have any issues.

joeschmuck · Sep 25, 2023

Davvo said:
I run my cronjob with the ./ and don't have any issues.

That is interesting because I could not make it work on my system running Core. I'm sure there must be some reason, not sure if we will ever know what that is.

Davvo · Sep 25, 2023

joeschmuck said:
That is interesting because I could not make it work on my system running Core. I'm sure there must be some reason, not sure if we will ever know what that is.

I'm not running it in a pool but in ./scripts instead, a folder I created in the root directory. Maybe this info will help, PM me if you require more.

GrimmReaperNL · Oct 1, 2023

Unfortunately the cron error'ed out again. the manual run did work.
This is without the period before slash, making sure no other scripts are triggered around that time.

Important Announcement for the TrueNAS Community.

multi_report.sh version for Core and Scale 3.0

Old Man

Explorer

Old Man

Explorer

Old Man

Explorer

Old Man

Old Man

Explorer

Old Man

Old Man

Explorer

Old Man

MVP

Explorer

Old Man

MVP

Old Man

MVP

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "multi_report.sh version for Core and Scale"

Similar threads