BOOT-Pool continuos writing [HELP]

ThEnGI · Nov 21, 2023

joeschmuck said:
If you run the script using -dump email it will email me a copy of your drive SMART data and I can look into it.

Done

joeschmuck said:
As for your original problem, have you looked into your SWAP file? Is any being used? If yes, then where is your SWAP file located?

when i run htop, no swap is used

sfatula said:
And then 1GB/h of log
so let's write 1TB/h (perhaps a typo?)

regarding the TB/h I was replying to Whattteva, who simplified it by saying that it was enough to use enterprise-grade SSD with a high TBW

sfatula said:
So, the problem is you have 1GB/h being written to the boot pool, is that correct? That's your sda graph, sda is your boot pool? And it's the wear level report. I got all that but started reading other posts and got mixed up. What drive is your application pool on?

sda is the 128GB boot ssd,whit costant writing of 256 kB/s or about 1GB/h. apps are on "FAST" (2TB NVMe).

sfatula said:
If you have 1GB/h being written, can you not determine which files as a hint which some monitoring? All logging goes to /var/log directory so would think the file(s) are there. Might help the ticket. And that was before you had kubernetes active if I understand correctly from post 28. That's a lot of data and cannot possibly be expected.

The problem occurred with or without kubernet. if that was the question

sfatula said:
To answer the other question, don't see it answered, there isn't really a need per se to mirror the boot pool. You can for uptime which I do as I don't want downtime. As long as you download the config every so often you just reinstall Scale on a new boot drive and restore the config file and you are back same as before.

chuck32 · Nov 21, 2023

sfatula said:
And now I see another user has come in and started posting stuff, I assumed it was the OP but actually looking now, it's not.

chuck32 said:
I don't want to hijack the thread, I could create a separate one but I guess it's somewhat related to OPs problem?

Sorry for the confusion!

I thought I'll add to here, since I'm seeing the same writes (250 KiB-ish) on the boot pool as OP and I realized the writes on the system dataset are even higher.

ThEnGI said:
The problem occurred with or without kubernet. if that was the question

sfatula · Nov 21, 2023

Your boot pool is more like my apps pool as far as activity. Not good! Very interested to follow your ticket. I am watching it.

I download my config file monthly. Reinstalling really easy. A mirror is even better, but you still want to download the config file every so often just in case. You do want it! I'd hate to have a boot pool failure and no spare drive. But if I had a spare drive, why not mirror it. Unless no spare ports.

sfatula · Nov 21, 2023

chuck32 said:
Sorry for the confusion!

I thought I'll add to here, since I'm seeing the same writes (250 KiB-ish) on the boot pool as OP and I realized the writes on the system dataset are even higher.

The confusion was on my part. As someone who reads and tries to help where they can, I don't always take the time to pay enough attention.

You have homeassistant, don't know how long you've used it but it is very chatty and that is normal. I am running it in a VM also via HASSOS. It is forever logging states of different devices in the logger. I log mine to mariadb, a separate app on Scale.

Davvo · Nov 21, 2023

chuck32 said:
Sorry for the confusion!

I thought I'll add to here, since I'm seeing the same writes (250 KiB-ish) on the boot pool as OP and I realized the writes on the system dataset are even higher.

@ThEnGI issue is that his system dataset is not in the boot pool, yet he is experiencing such abnormal writes volume.

Patrick M. Hausen · Nov 21, 2023

I found that my Cobia system also writes to /dev/sda continuously:

Bildschirmfoto 2023-11-21 um 22.18.18.png

To rule out a problem with the reporting - ESXi seems to agree (the system drive is a VMDK):

Bildschirmfoto 2023-11-21 um 22.18.54.png

And ... *drumroll* ... I present one possible culprit:

Code:

root@truenas[/var/log/netdata]# tail -f /var/log/netdata/access.log
2023-11-21 22:36:45: 955648: 3199 '[localhost]:57448' 'DATA' (sent/all = 5561/53715 bytes -90%, prep/sent/total = 0.70/0.51/1.21 ms) 200 '/api/v1/allmetrics?format=json'
2023-11-21 22:36:45: 955649: 3174 '[localhost]:57464' 'CONNECTED'
2023-11-21 22:36:45: 955649: 3174 '[localhost]:57464' 'DISCONNECTED'
2023-11-21 22:36:45: 955649: 3174 '[localhost]:57464' 'DATA' (sent/all = 5313/53680 bytes -90%, prep/sent/total = 0.60/0.49/1.10 ms) 200 '/api/v1/allmetrics?format=json'
2023-11-21 22:36:47: 955650: 3200 '[localhost]:57476' 'CONNECTED'
2023-11-21 22:36:47: 955650: 3200 '[localhost]:57476' 'DISCONNECTED'
2023-11-21 22:36:47: 955650: 3200 '[localhost]:57476' 'DATA' (sent/all = 5341/53685 bytes -90%, prep/sent/total = 1.49/1.31/2.80 ms) 200 '/api/v1/allmetrics?format=json'
2023-11-21 22:36:47: 955651: 3199 '[localhost]:57482' 'CONNECTED'
2023-11-21 22:36:47: 955651: 3199 '[localhost]:57482' 'DISCONNECTED'
2023-11-21 22:36:47: 955651: 3199 '[localhost]:57482' 'DATA' (sent/all = 5313/53678 bytes -90%, prep/sent/total = 0.76/0.48/1.24 ms) 200 '/api/v1/allmetrics?format=json'
[...]

What the heck? Why log when the middleware continuously polls netdata? And besides - this does not belong on the boot drive.

Patrick M. Hausen · Nov 22, 2023

Rough calculation, overestimating the data from above: .5 MBytes/s amount to 41 GB per day or 15 TB per year. So nothing that will ruin my SSD in the short run, but iX really should re-evaluate

- the amount of logging, especially in a production release
- the location of the log files

sfatula · Nov 22, 2023

FIle a ticket for the log spam certainly with netdata. Completely unnecessary logging there.

chuck32 · Nov 22, 2023

Davvo said:
@ThEnGI issue is that his system dataset is not in the boot pool, yet he is experiencing such abnormal writes volume.

It's the same for me, OP reported around 250 Kib written on boot pool, that's the same number I have. Additionally my writes on my VM pool are an order of magnitude higher. I also don't have the systemdataset on the boot pool.
The assigned dev on my ticket said he may have an idea. With @Patrick M. Hausen also reporting issues, I'm confident there will be some update in the future. It's not single users with a misconfiguration at this point I'd say.

sfatula said:
You have homeassistant, don't know how long you've used it but it is very chatty and that is normal. I am running it in a VM also via HASSOS. It is forever logging states of different devices in the logger. I log mine to mariadb, a separate app on Scale.

HA is what got me into this whole home server mess ;)
I also thought about mariadb, because the db as of now does not seem to be persistent forever. But since I probably don't really need all that history I haven't gotten around to it. So you used a scale app directly to log?
Nonetheless it would still be on the same pool though, so for this particular problem I probably won't gain anything.

ThEnGI · Nov 22, 2023

I already opened the ticket, it was set as low priority

it must be said that it is not urgent and one's problems always feel like a high priority.
@joeschmuck is investigating the values reported by the script, as they are (perhaps) not real. 500GiB written should not result in a 20% reduction in 128GB SSD life

@chuck32 I didn't notice if the disk where the "system dataset" resides has anomalous writes. but since I have on average 3/5 MiB/s of writing (due to docker/kubernet) 250KiB/s makes no difference. the SSD is 2TB (20 15.625 times larger than the boot pool) it can easily support these extra writes

joeschmuck · Nov 23, 2023

ThEnGI said:
@joeschmuck is investigating the values reported by the script, as they are (perhaps) not real. 500GiB

The value is correct, I verified it last night, I did confirm 534.67GB was written (had to manually do the math

). However the wear level on your drive is not correct due to some off the wall SMART reporting. I will chat with you over a PM, not here about the wear level but I do have a customization fix for it.

sfatula · Nov 23, 2023

chuck32 said:
HA is what got me into this whole home server mess ;)
I also thought about mariadb, because the db as of now does not seem to be persistent forever. But since I probably don't really need all that history I haven't gotten around to it. So you used a scale app directly to log?
Nonetheless it would still be on the same pool though, so for this particular problem I probably won't gain anything.

It really is by default. I am using mariadb, I don't use IX or Truecharts apps but the docker containers via what is called custom apps on Cobia. I am not on Cobia yet. I don't like sqlite for the most part and find mariadb better for my purposes which includes several containers on Scale as well as my own creations, and the ease I can access the data from GUIs like Mysql Workbench or DBeaver. Rather have one centralized place to store everything. I have at least a year of HA history logged now, it's actually useful data. It can all be configued in HA.

ThEnGI · Nov 25, 2023

OT:
While waiting for a response from IX, I was writing down the next upgrades to do
As regards the PCIe compartment I have two X16 slots (16+4b lines) and one X1 slot available.
I still haven't quite figured out what to do with the X1 slot, maybe a 2.5GBE card?
Regarding the X16 connector, I was thinking of mounting an LSI 9300-8i (8 Disk) controller in the slot with 4 lines, is the bandwidth sufficient?

At that point I am left with an X16 slot in which to install a 4xNvme adapter. thus obtaining 10HDD + 6 NVME. But with 2.5GB connectivity.
or sacrifice the NVME disks and install a 10GBE NIC

what is the best solution?

EDIT:I would like to avoid "link aggregation" to keep the LAN simple

/OT

ThEnGI · Dec 2, 2023

OT (again)

I'm getting "High" (average below 10%) IO wait,But I didn't understand exactly what it represents....
If I understand correctly, it's the pools that are "slowing down" the system, correct?

And how is the "system load average" value measured in reporting?
From the dashboard I have a value that varies between 1/5/10%, in the report around 1.2 (Processes ?)

I have 6 active Dockers, so I'm not surprised by a bit of load on the CPU

END OT

chuck32 · Dec 2, 2023

ThEnGI said:
I still haven't quite figured out what to do with the X1 slot, maybe a 2.5GBE card?
Regarding the X16 connector, I was thinking of mounting an LSI 9300-8i (8 Disk) controller in the slot with 4 lines, is the bandwidth sufficient?

At that point I am left with an X16 slot in which to install a 4xNvme adapter. thus obtaining 10HDD + 6 NVME. But with 2.5GB connectivity.
or sacrifice the NVME disks and install a 10GBE NIC

Maybe post in the appropriate subforum to get proper attention.

Regarding the bandwidth, it's a x8 card in a x16 slot, why wouldn't it be sufficient?

From what I read you should stay away from 2.5 GB, either go or stay at one. The performance/hardware for 2.5Gb is subpar (further reading).

Even with HDDs (depending on your pool layout) you can easily max out 2.5 Gbe speeds, so 10 Gbe wouldn't be a waste.

I'd probably sacrifice the NVME disks for 10Gbe. Aren't there 16 port HBAs anyway? You could use 2.5 SSDs then.

ThEnGI · Dec 2, 2023

ok for the 2.5Gbe

The idea is:
1 x16 PCIE 3.0 (x16 Line), 1 x 8 Slot HBA
1 x16 PCIE 3.0 (x4 Line), 1 x 10Gbe NIC (X20-DA1)
1 X1 PCIE 3.0 (x1 Line), 1 x 1 NVME (adapter) reduced speed

in my case there are only 10 of 3.5 and 2 of 2.5. The two 2.5 are the boot disks. An 8 slot HBA is enough to cover all disks (the MB has 6 SATA ports). maybe i will use 2.5to3.5 adapter and sata SSD,which is not a bad idea

I'm writing here because it's not urgent and I needed to get the post up, at the moment priority is UPS, second HDD and second NVME

ThEnGI · Dec 11, 2023

They closed the ticket reporting that the swap is on the boot disk
How did it end up there? How can I move it?
I looked in the GUI but it only lets me change the size

chuck32 · Dec 11, 2023

They closed my ticket for the same reason. I have 128 Gb of memory (with mostly 30 Gb free, depending on which VMs are spun up), so I don't really see a reason to use swap at all.

ThEnGI said:
How did it end up there?

Probably said yes upon installation to creating a swap partition. Iirc I ended up with swap on an earlier installation (bluefin) even when I said no.

ThEnGI said:
I looked in the GUI but it only lets me change the size

Did you try setting it to 0?

I'm thinking about reinstalling when I swap out my PSU / want to install the next cobia update if it cannot be removed as is. This thread however suggests I may get away with replacing the drives with themselves since I mirrored my boot pool.

NugentS · Dec 11, 2023

Just for context - my boot-pool contains the system dataset and is being written to as an average of 317.32KiB over whatever time period is involved.

Its continuous

ThEnGI · Dec 13, 2023

To remove the swap from the boot drive need i to reinstall TN ?

Important Announcement for the TrueNAS Community.

BOOT-Pool continuos writing [HELP]

Contributor

Guru

Guru

Guru

MVP

Hall of Famer

Hall of Famer

Guru

Guru

Contributor

Old Man

Guru

Contributor

Contributor

Guru

Contributor

Contributor

Guru

MVP

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "BOOT-Pool continuos writing [HELP]"

Similar threads