Today after a restart, none of my apps will start. k3s says the node is ready, but there is a `not-ready` taint, and the logs look like containers are trying to start but can not be accessed. I've browsed these forums for a few hours and couldn't find any relevant posts, but here is what I'm seeing:
I have tried restarting multiple times, restoring from a recent config backup, unsetting and re-setting the app pool, and nothing seems to work. I'm at a loss for what to try next. Sometimes it will show App Services started, but spam logs with "failed to connect" when doing the container health checks. It's not super consistent.
Code:
Failed to start kubernetes cluster for Applications: [EFAULT] Kube-router routes not applied as timed out waiting for pods to execute
Code:
root@truenas[~]# k3s kubectl get nodes NAME STATUS ROLES AGE VERSION ix-truenas Ready control-plane,master 406d v1.26.6+k3s-e18037a7-dirty
Code:
root@truenas[~]# k3s kubectl describe node ix-truenas Name: ix-truenas Roles: control-plane,master Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=ix-truenas kubernetes.io/os=linux node-role.kubernetes.io/control-plane=true node-role.kubernetes.io/master=true openebs.io/nodeid=ix-truenas openebs.io/nodename=ix-truenas Annotations: csi.volume.kubernetes.io/nodeid: {"zfs.csi.openebs.io":"ix-truenas"} k3s.io/node-args: ["server","--cluster-cidr","172.16.0.0/16","--cluster-dns","172.17.0.10","--data-dir","/mnt/tank/ix-applications/k3s","--disable","metrics... k3s.io/node-config-hash: 5VMKXMDJNBI2D5KDF52SDV37V4ZY2EIOJXZDRLUULQXDPJX5RA4Q==== k3s.io/node-env: {"K3S_DATA_DIR":"/mnt/tank/ix-applications/k3s/data/6c243f7cbf543e01911aa24f7651922820ca56e79179e8fd215a3e4381aceecf"} node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Tue, 29 Nov 2022 22:25:48 -0500 Taints: node.kubernetes.io/not-ready:NoSchedule Unschedulable: false Lease: HolderIdentity: ix-truenas AcquireTime: <unset> RenewTime: Wed, 10 Jan 2024 22:08:12 -0500 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Wed, 10 Jan 2024 22:08:13 -0500 Mon, 13 Nov 2023 22:16:40 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Wed, 10 Jan 2024 22:08:13 -0500 Mon, 13 Nov 2023 22:16:40 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Wed, 10 Jan 2024 22:08:13 -0500 Mon, 13 Nov 2023 22:16:40 -0500 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Wed, 10 Jan 2024 22:08:13 -0500 Wed, 10 Jan 2024 22:08:13 -0500 KubeletReady kubelet is posting ready status. AppArmor enabled Addresses: InternalIP: 192.168.1.30 Hostname: ix-truenas Capacity: cpu: 12 ephemeral-storage: 27854154Mi hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 65761356Ki nvidia.com/gpu: 0 pods: 250 Allocatable: cpu: 12 ephemeral-storage: 27746837493708 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 65761356Ki nvidia.com/gpu: 0 pods: 250 System Info: Machine ID: 3226bac618d148519c61c31b083dc929 System UUID: af59a1a8-6f8d-0000-0000-000000000000 Boot ID: e4291094-7048-4ac4-8d8c-595cf703dcc2 Kernel Version: 6.1.63-production+truenas OS Image: Debian GNU/Linux 12 (bookworm) Operating System: linux Architecture: amd64 Container Runtime Version: containerd://Unknown Kubelet Version: v1.26.6+k3s-e18037a7-dirty Kube-Proxy Version: v1.26.6+k3s-e18037a7-dirty PodCIDR: 172.16.0.0/16 PodCIDRs: 172.16.0.0/16 Non-terminated Pods: (26 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- kube-system nvidia-device-plugin-daemonset-7skx6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m kube-system csi-nfs-controller-7b74694749-c2dwh 40m (0%) 0 (0%) 80Mi (0%) 900Mi (1%) 34m kube-system openebs-zfs-node-74wn8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m cert-manager cert-manager-8444f6f86b-bxfww 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m ix-cloudflared cloudflared-5d8bc8d5cd-cnjlg 10m (0%) 4 (33%) 50Mi (0%) 8Gi (12%) 34m ix-requestrr requestrr-5b94d84495-7s9ql 10m (0%) 4 (33%) 50Mi (0%) 8Gi (12%) 34m metallb-system speaker-cc9v8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m cnpg-system cnpg-controller-manager-5d74bc79fb-rtq5z 100m (0%) 100m (0%) 100Mi (0%) 200Mi (0%) 34m kube-system openebs-zfs-controller-0 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m prometheus-operator prometheus-operator-5dcffb7cb8-vvtdw 100m (0%) 200m (1%) 100Mi (0%) 200Mi (0%) 34m ix-jackett jackett-bd7f48b58-zcc2q 20m (0%) 8 (66%) 100Mi (0%) 16Gi (25%) 34m ix-radarr radarr-74588c7f96-nxd96 10m (0%) 4 (33%) 50Mi (0%) 8Gi (12%) 34m kube-system coredns-59b4f5bbd5-9t7td 100m (0%) 0 (0%) 70Mi (0%) 170Mi (0%) 34m cert-manager cert-manager-webhook-545bd5d7d8-zlcf7 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m ix-qbittorrent qbittorrent-b9686749d-mds8f 20m (0%) 8 (66%) 100Mi (0%) 16Gi (25%) 34m kube-system csi-nfs-node-xr5r8 30m (0%) 0 (0%) 60Mi (0%) 500Mi (0%) 17m kube-system csi-smb-controller-7fbbb8fb6f-dvwxb 30m (0%) 2 (16%) 60Mi (0%) 600Mi (0%) 34m ix-wyoming-piper wyoming-piper-custom-app-7fbbc78649-qbk45 10m (0%) 4 (33%) 50Mi (0%) 8Gi (12%) 34m kube-system snapshot-controller-546868dfb4-fngtf 10m (0%) 0 (0%) 20Mi (0%) 300Mi (0%) 34m ix-plex plex-897c9965b-8gz4b 10m (0%) 12 (100%) 50Mi (0%) 8Gi (12%) 34m kube-system svclb-dizquetv-bb5710f6-56xls 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m kube-system svclb-wyoming-whisper-custom-app-c1cb0c8d-v28lp 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m cert-manager cert-manager-cainjector-ffb4747bb-hbgcn 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m kube-system svclb-frigate-12-custom-app-3a50a40b-2b8bl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m kube-system svclb-plex-955ab32e-z57fq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m kube-system svclb-wyoming-piper-custom-app-6374b442-7xgdf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 500m (4%) 46300m (385%) memory 940Mi (1%) 76598Mi (119%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) nvidia.com/gpu 0 0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeNotReady 159m kubelet Node ix-truenas status is now: NodeNotReady Normal Starting 159m kubelet Starting kubelet. Warning InvalidDiskCapacity 159m kubelet invalid capacity 0 on image filesystem Normal NodeAllocatableEnforced 159m kubelet Updated Node Allocatable limit across pods Normal NodeHasSufficientMemory 159m kubelet Node ix-truenas status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 159m kubelet Node ix-truenas status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 159m kubelet Node ix-truenas status is now: NodeHasSufficientPID Normal NodePasswordValidationComplete 159m k3s-supervisor Deferred node password secret validation complete Normal RegisteredNode 159m node-controller Node ix-truenas event: Registered Node ix-truenas in Controller Warning Rebooted 44m (x675 over 159m) kubelet Node ix-truenas has been rebooted, boot id: e4f4f164-f984-4dbe-9073-dfc8c6f74123 Normal NodeHasSufficientPID 34m kubelet Node ix-truenas status is now: NodeHasSufficientPID Normal Starting 34m kubelet Starting kubelet. Warning InvalidDiskCapacity 34m kubelet invalid capacity 0 on image filesystem Normal NodeAllocatableEnforced 34m kubelet Updated Node Allocatable limit across pods Normal NodeHasSufficientMemory 34m kubelet Node ix-truenas status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 34m kubelet Node ix-truenas status is now: NodeHasNoDiskPressure Normal NodePasswordValidationComplete 34m k3s-supervisor Deferred node password secret validation complete Normal RegisteredNode 34m node-controller Node ix-truenas event: Registered Node ix-truenas in Controller Warning Rebooted 24m (x87 over 34m) kubelet Node ix-truenas has been rebooted, boot id: 0b8e8bd5-0216-4fe3-a25a-c9b6987c96ee Normal NodeHasNoDiskPressure 17m kubelet Node ix-truenas status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 17m kubelet Node ix-truenas status is now: NodeHasSufficientPID Normal Starting 17m kubelet Starting kubelet. Warning InvalidDiskCapacity 17m kubelet invalid capacity 0 on image filesystem Normal NodeAllocatableEnforced 17m kubelet Updated Node Allocatable limit across pods Normal NodeHasSufficientMemory 17m kubelet Node ix-truenas status is now: NodeHasSufficientMemory Normal NodeNotReady 17m kubelet Node ix-truenas status is now: NodeNotReady Normal NodePasswordValidationComplete 17m k3s-supervisor Deferred node password secret validation complete Normal RegisteredNode 17m node-controller Node ix-truenas event: Registered Node ix-truenas in Controller Warning Rebooted 7m44s (x60 over 17m) kubelet Node ix-truenas has been rebooted, boot id: 42e259d9-909a-4479-87c9-d007ab5c42a2 Normal Starting 95s kubelet Starting kubelet. Warning InvalidDiskCapacity 95s kubelet invalid capacity 0 on image filesystem Normal NodeAllocatableEnforced 95s kubelet Updated Node Allocatable limit across pods Normal NodeHasSufficientMemory 95s kubelet Node ix-truenas status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 95s kubelet Node ix-truenas status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 95s kubelet Node ix-truenas status is now: NodeHasSufficientPID Normal NodeReady 94s kubelet Node ix-truenas status is now: NodeReady Normal NodePasswordValidationComplete 91s k3s-supervisor Deferred node password secret validation complete Normal RegisteredNode 85s node-controller Node ix-truenas event: Registered Node ix-truenas in Controller Warning Rebooted 85s (x18 over 95s) kubelet Node ix-truenas has been rebooted, boot id: e4291094-7048-4ac4-8d8c-595cf703dcc2
I have tried restarting multiple times, restoring from a recent config backup, unsetting and re-setting the app pool, and nothing seems to work. I'm at a loss for what to try next. Sometimes it will show App Services started, but spam logs with "failed to connect" when doing the container health checks. It's not super consistent.