openebs-zfs-controller-0 pod in error state from time to time

impovich

Explorer
Joined
May 12, 2021
Messages
72
Hi all, openebs-zfs-controller-0 pod is constantly recreated because of errors in containers.

Code:
kube-system        openebs-zfs-controller-0                  1/5     Error     134        19h


k8s.gcr.io/sig-storage/csi-resizer logs
Code:
goroutine 142 [IO wait, 5 minutes]:
internal/poll.runtime_pollWait(0x7f3868c82e38, 0x72, 0x1a35180)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/runtime/netpoll.go:220 +0x55
internal/poll.(*pollDesc).wait(0xc0000f5998, 0x72, 0xc000366800, 0x702, 0x702)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc0000f5980, 0xc000366800, 0x702, 0x702, 0x0, 0x0, 0x0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/internal/poll/fd_unix.go:159 +0x1b1
net.(*netFD).Read(0xc0000f5980, 0xc000366800, 0x702, 0x702, 0x203000, 0x10, 0x1f2)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc00041a1a0, 0xc000366800, 0x702, 0x702, 0x0, 0x0, 0x0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/net/net.go:182 +0x8e
crypto/tls.(*atLeastReader).Read(0xc0006dc920, 0xc000366800, 0x702, 0x702, 0x0, 0x2f7, 0xc000405360)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/crypto/tls/conn.go:779 +0x62
bytes.(*Buffer).ReadFrom(0xc0000a4280, 0x1a30f80, 0xc0006dc920, 0x40b445, 0x168f680, 0x17fd920)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/bytes/buffer.go:204 +0xb1
crypto/tls.(*Conn).readFromUntil(0xc0000a4000, 0x1a33a80, 0xc00041a1a0, 0x5, 0xc00041a1a0, 0x0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/crypto/tls/conn.go:801 +0xf3
crypto/tls.(*Conn).readRecordOrCCS(0xc0000a4000, 0x0, 0x0, 0x0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/crypto/tls/conn.go:608 +0x115
crypto/tls.(*Conn).readRecord(...)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/crypto/tls/conn.go:576
crypto/tls.(*Conn).Read(0xc0000a4000, 0xc0002f8000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/crypto/tls/conn.go:1252 +0x15f
net/http.(*persistConn).Read(0xc0001445a0, 0xc0002f8000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/net/http/transport.go:1887 +0x77
bufio.(*Reader).fill(0xc00012da40)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/bufio/bufio.go:101 +0x105
bufio.(*Reader).ReadSlice(0xc00012da40, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/bufio/bufio.go:360 +0x3d
net/http/internal.readChunkLine(0xc00012da40, 0x0, 0x0, 0x0, 0x0, 0x0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/net/http/internal/chunked.go:122 +0x34
net/http/internal.(*chunkedReader).beginChunk(0xc0006ae7b0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/net/http/internal/chunked.go:48 +0x32
net/http/internal.(*chunkedReader).Read(0xc0006ae7b0, 0xc0006c9a00, 0x200, 0x200, 0x0, 0x0, 0x0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/net/http/internal/chunked.go:93 +0x145
net/http.(*body).readLocked(0xc0005495c0, 0xc0006c9a00, 0x200, 0x200, 0x0, 0x0, 0x0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/net/http/transfer.go:833 +0x5f
net/http.(*body).Read(0xc0005495c0, 0xc0006c9a00, 0x200, 0x200, 0x0, 0x0, 0x0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/net/http/transfer.go:825 +0xf9
net/http.(*bodyEOFSignal).Read(0xc000549600, 0xc0006c9a00, 0x200, 0x200, 0x0, 0x0, 0x0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/net/http/transport.go:2716 +0xe2
encoding/json.(*Decoder).refill(0xc00001a420, 0xc0006dc900, 0x7f38689552e0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/encoding/json/stream.go:165 +0xeb
encoding/json.(*Decoder).readValue(0xc00001a420, 0x0, 0x0, 0x1655a40)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/encoding/json/stream.go:140 +0x1ff
encoding/json.(*Decoder).Decode(0xc00001a420, 0x167ca00, 0xc0006dc900, 0x19189b8, 0x0)
    /go/pkg/csiprow.XXXXehmkDC/go-1.15/src/encoding/json/stream.go:63 +0x79
k8s.io/apimachinery/pkg/util/framer.(*jsonFrameReader).Read(0xc0006ae810, 0xc0000d6c00, 0x400, 0x400, 0xc000502dd0, 0x38, 0x38)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/framer/framer.go:152 +0x1a8
k8s.io/apimachinery/pkg/runtime/serializer/streaming.(*decoder).Decode(0xc0000bfa40, 0x0, 0x1a3e4a0, 0xc000549680, 0x4, 0x1a19a10, 0xc000281b00, 0xc000502ec0, 0x405d2e)
    /workspace/vendor/k8s.io/apimachinery/pkg/runtime/serializer/streaming/streaming.go:77 +0x89
k8s.io/client-go/rest/watch.(*Decoder).Decode(0xc0006dc8e0, 0x0, 0xc000083708, 0x0, 0x0, 0x0, 0x0)
    /workspace/vendor/k8s.io/client-go/rest/watch/decoder.go:49 +0x6e
k8s.io/apimachinery/pkg/watch.(*StreamWatcher).receive(0xc000549640)
    /workspace/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:104 +0x14a
created by k8s.io/apimachinery/pkg/watch.NewStreamWatcher
    /workspace/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:71 +0xbe


k8s.gcr.io/sig-storage/snapshot-controller logs
Code:
goroutine 186 [sync.Cond.Wait, 10 minutes]:
runtime.goparkunlock(...)
    /go/pkg/csiprow.XXXXGFMNhe/go-1.15/src/runtime/proc.go:312
sync.runtime_notifyListWait(0xc0004c0690, 0x4)
    /go/pkg/csiprow.XXXXGFMNhe/go-1.15/src/runtime/sema.go:513 +0xf8
sync.(*Cond).Wait(0xc0004c0680)
    /go/pkg/csiprow.XXXXGFMNhe/go-1.15/src/sync/cond.go:56 +0x9d
k8s.io/client-go/util/workqueue.(*Type).Get(0xc0004a9980, 0x0, 0x0, 0x0)
    /workspace/vendor/k8s.io/client-go/util/workqueue/queue.go:145 +0x89
github.com/kubernetes-csi/external-snapshotter/v4/pkg/common-controller.(*csiSnapshotCommonController).contentWorker(0xc0004d00f0)
    /workspace/pkg/common-controller/snapshot_controller_base.go:262 +0x62
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0005cf7b0)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0005cf7b0, 0x1912180, 0xc0002c3530, 0x1, 0xc0000ba540)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xad
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0005cf7b0, 0x0, 0x0, 0x1, 0xc0000ba540)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc0005cf7b0, 0x0, 0xc0000ba540)
    /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by github.com/kubernetes-csi/external-snapshotter/v4/pkg/common-controller.(*csiSnapshotCommonController).Run
    /workspace/pkg/common-controller/snapshot_controller_base.go:145 +0x23e


openebs-zfs-controller-0 pod describe
Code:
Name:                 openebs-zfs-controller-0
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 ix-truenas/192.168.10.15
Start Time:           Tue, 22 Jun 2021 16:56:01 +0200
Labels:               app=openebs-zfs-controller
                      controller-revision-hash=openebs-zfs-controller-56b9b7c4c7
                      openebs.io/component-name=openebs-zfs-controller
                      openebs.io/version=ci
                      role=openebs-zfs
                      statefulset.kubernetes.io/pod-name=openebs-zfs-controller-0
Annotations:          k8s.v1.cni.cncf.io/network-status:
                        [{
                            "name": "ix-net",
                            "interface": "eth0",
                            "ips": [
                                "172.16.0.84"
                            ],
                            "mac": "e6:de:a5:8f:10:27",
                            "default": true,
                            "dns": {}
                        }]
                      k8s.v1.cni.cncf.io/networks-status:
                        [{
                            "name": "ix-net",
                            "interface": "eth0",
                            "ips": [
                                "172.16.0.84"
                            ],
                            "mac": "e6:de:a5:8f:10:27",
                            "default": true,
                            "dns": {}
                        }]
Status:               Running
IP:                   172.16.0.84
IPs:
  IP:           172.16.0.84
Controlled By:  StatefulSet/openebs-zfs-controller
Containers:
  csi-resizer:
    Container ID:  docker://4a828a511bd9ab7d976e200db1d391207b422a093c80729989c9d0f255602aaa
    Image:         k8s.gcr.io/sig-storage/csi-resizer:v1.1.0
    Image ID:      docker-pullable://k8s.gcr.io/sig-storage/csi-resizer@sha256:7a5ba58a44e0d749e0767e4e37315bcf6a61f33ce3185c1991848af4db0fb70a
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=$(ADDRESS)
      --leader-election
    State:          Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 23 Jun 2021 11:58:03 +0200
      Finished:     Wed, 23 Jun 2021 12:12:10 +0200
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 23 Jun 2021 11:23:24 +0200
      Finished:     Wed, 23 Jun 2021 11:57:08 +0200
    Ready:          False
    Restart Count:  23
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-swxlg (ro)
  csi-snapshotter:
    Container ID:  docker://e40848f6de2bed81cc1c373749542c99ee04f0656dbe996956abd40e58c598e2
    Image:         k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0
    Image ID:      docker-pullable://k8s.gcr.io/sig-storage/csi-snapshotter@sha256:51f2dfde5bccac7854b3704689506aeecfb793328427b91115ba253a93e60782
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=$(ADDRESS)
      --leader-election
    State:          Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 23 Jun 2021 11:58:52 +0200
      Finished:     Wed, 23 Jun 2021 12:12:08 +0200
    Ready:          False
    Restart Count:  26
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-swxlg (ro)
  snapshot-controller:
    Container ID:  docker://cd6a6c7fec7eebbceffed2fd7ed769c2d5ccbc171e2a5c752be3a755c00bf045
    Image:         k8s.gcr.io/sig-storage/snapshot-controller:v4.0.0
    Image ID:      docker-pullable://k8s.gcr.io/sig-storage/snapshot-controller@sha256:00fcc441ea9f72899c25eed61d602272a2a58c5f0014332bdcb5ac24acef08e4
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --leader-election=true
    State:          Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 23 Jun 2021 11:59:38 +0200
      Finished:     Wed, 23 Jun 2021 12:12:07 +0200
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 23 Jun 2021 11:25:29 +0200
      Finished:     Wed, 23 Jun 2021 11:57:07 +0200
    Ready:          False
    Restart Count:  22
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-swxlg (ro)
  csi-provisioner:
    Container ID:  docker://0370435c614152d2d8586f672bf7a7a7703c1d2a5a3962a3645268218a737cc0
    Image:         k8s.gcr.io/sig-storage/csi-provisioner:v2.1.0
    Image ID:      docker-pullable://k8s.gcr.io/sig-storage/csi-provisioner@sha256:20c828075d1e36f679d6a91e905b0927141eef5e15be0c9a1ca4a6a0ed9313d2
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=5
      --feature-gates=Topology=true
      --strict-topology
      --leader-election
      --extra-create-metadata=true
      --default-fstype=ext4
    State:          Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 23 Jun 2021 12:02:34 +0200
      Finished:     Wed, 23 Jun 2021 12:12:09 +0200
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 23 Jun 2021 12:01:30 +0200
      Finished:     Wed, 23 Jun 2021 12:01:31 +0200
    Ready:          False
    Restart Count:  58
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-swxlg (ro)
  openebs-zfs-plugin:
    Container ID:  docker://699c781de273de69ec018593558b362cc6b9b823081859e00f9f9daba01a7677
    Image:         openebs/zfs-driver:ci
    Image ID:      docker-pullable://openebs/zfs-driver@sha256:5ec547d790d226a44e1a3ddf4de5c47ef3e20772f2b58aba5dfdebbf61f29448
    Port:          <none>
    Host Port:     <none>
    Args:
      --endpoint=$(OPENEBS_CSI_ENDPOINT)
      --plugin=$(OPENEBS_CONTROLLER_DRIVER)
    State:          Running
      Started:      Wed, 23 Jun 2021 11:06:28 +0200
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 23 Jun 2021 10:01:59 +0200
      Finished:     Wed, 23 Jun 2021 10:56:23 +0200
    Ready:          True
    Restart Count:  5
    Environment:
      OPENEBS_CONTROLLER_DRIVER:    controller
      OPENEBS_CSI_ENDPOINT:         unix:///var/lib/csi/sockets/pluginproxy/csi.sock
      OPENEBS_NAMESPACE:            openebs
      OPENEBS_IO_INSTALLER_TYPE:    zfs-operator
      OPENEBS_IO_ENABLE_ANALYTICS:  true
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-swxlg (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  socket-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-swxlg:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason   Age                  From     Message
  ----    ------   ----                 ----     -------
  Normal  Pulled   52m                  kubelet  Successfully pulled image "k8s.gcr.io/sig-storage/csi-resizer:v1.1.0" in 1.562421235s
  Normal  Pulled   17m                  kubelet  Successfully pulled image "k8s.gcr.io/sig-storage/csi-resizer:v1.1.0" in 962.159394ms
  Normal  Created  17m (x3 over 72m)    kubelet  Created container csi-resizer
  Normal  Started  17m (x3 over 72m)    kubelet  Started container csi-resizer
  Normal  Pulling  17m (x3 over 72m)    kubelet  Pulling image "k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0"
  Normal  Pulled   17m                  kubelet  Successfully pulled image "k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0" in 908.373249ms
  Normal  Created  16m (x3 over 71m)    kubelet  Created container csi-snapshotter
  Normal  Created  12m (x5 over 69m)    kubelet  Created container csi-provisioner
  Normal  Pulling  2m58s (x4 over 72m)  kubelet  Pulling image "k8s.gcr.io/sig-storage/csi-resizer:v1.1.0"
  Normal  Pulled   2m57s                kubelet  Successfully pulled image "k8s.gcr.io/sig-storage/csi-resizer:v1.1.0" in 1.006191553s
 
Last edited:

impovich

Explorer
Joined
May 12, 2021
Messages
72
Just noticed that it crashed again during plex pod recreation with the same errors.
 

impovich

Explorer
Joined
May 12, 2021
Messages
72
Some pictures, how the state changes when I recreate the plex pod, simply hitting submit button without any changes. Plex here is just an example.
 

Attachments

  • 1.png
    1.png
    134.6 KB · Views: 335
  • 2.png
    2.png
    142 KB · Views: 291
  • 3.png
    3.png
    135 KB · Views: 326

waqarahmed

iXsystems
iXsystems
Joined
Aug 28, 2019
Messages
136
@impovich thank you for sharing the debug. In the debug it's running as desired during the time debug was taken. Can you please confirm if you are noticing/experiencing degraded functionality like PV's not being created or something which the CSI is supposed to be doing and is not happening ?
 

impovich

Explorer
Joined
May 12, 2021
Messages
72
@impovich thank you for sharing the debug. In the debug it's running as desired during the time debug was taken. Can you please confirm if you are noticing/experiencing degraded functionality like PV's not being created or something which the CSI is supposed to be doing and is not happening ?

Almost for all APPs i use hostPath, everything is 568:568
Just deployed sonarr using PVC as storage.

Here is what I see for sonarr pod:
Code:
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  7m7s  default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
  Warning  FailedScheduling  7m6s  default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.


Here is what I see for openebs-zfs-controller-0 pod:

Code:
Name:                 openebs-zfs-controller-0
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 ix-truenas/192.168.10.15
Start Time:           Tue, 22 Jun 2021 16:56:01 +0200
Labels:               app=openebs-zfs-controller
                      controller-revision-hash=openebs-zfs-controller-56b9b7c4c7
                      openebs.io/component-name=openebs-zfs-controller
                      openebs.io/version=ci
                      role=openebs-zfs
                      statefulset.kubernetes.io/pod-name=openebs-zfs-controller-0
Annotations:          k8s.v1.cni.cncf.io/network-status:
                        [{
                            "name": "ix-net",
                            "interface": "eth0",
                            "ips": [
                                "172.16.0.110"
                            ],
                            "mac": "5a:51:40:bf:12:e9",
                            "default": true,
                            "dns": {}
                        }]
                      k8s.v1.cni.cncf.io/networks-status:
                        [{
                            "name": "ix-net",
                            "interface": "eth0",
                            "ips": [
                                "172.16.0.110"
                            ],
                            "mac": "5a:51:40:bf:12:e9",
                            "default": true,
                            "dns": {}
                        }]
Status:               Running
IP:                   172.16.0.110
IPs:
  IP:           172.16.0.110
Controlled By:  StatefulSet/openebs-zfs-controller
Containers:
  csi-resizer:
    Container ID:  docker://796ee6dec8251c60bba6a8882dabd452983e208d84f399608b722bd6ad568a88
    Image:         k8s.gcr.io/sig-storage/csi-resizer:v1.1.0
    Image ID:      docker-pullable://k8s.gcr.io/sig-storage/csi-resizer@sha256:7a5ba58a44e0d749e0767e4e37315bcf6a61f33ce3185c1991848af4db0fb70a
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=$(ADDRESS)
      --leader-election
    State:          Waiting
      Reason:       CreateContainerError
    Last State:     Terminated
      Exit Code:    0
      Started:      Mon, 01 Jan 0001 00:00:00 +0000
      Finished:     Mon, 01 Jan 0001 00:00:00 +0000
    Ready:          False
    Restart Count:  40
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-swxlg (ro)
  csi-snapshotter:
    Container ID:  docker://8918c4bc59a5dbb4e0a4adfaaad14eff1af70e6b94b749646f41be4f95e784fa
    Image:         k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0
    Image ID:      docker-pullable://k8s.gcr.io/sig-storage/csi-snapshotter@sha256:51f2dfde5bccac7854b3704689506aeecfb793328427b91115ba253a93e60782
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=$(ADDRESS)
      --leader-election
    State:          Running
      Started:      Wed, 23 Jun 2021 19:44:02 +0200
    Last State:     Terminated
      Reason:       Error
      Message:      Lost connection to CSI driver, exiting
      Exit Code:    255
      Started:      Wed, 23 Jun 2021 18:10:54 +0200
      Finished:     Wed, 23 Jun 2021 19:39:47 +0200
    Ready:          True
    Restart Count:  43
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-swxlg (ro)
  snapshot-controller:
    Container ID:  docker://df66f24a4049f345e092abfe973019492a2974a0ffe74f663562907470d1ba7e
    Image:         k8s.gcr.io/sig-storage/snapshot-controller:v4.0.0
    Image ID:      docker-pullable://k8s.gcr.io/sig-storage/snapshot-controller@sha256:00fcc441ea9f72899c25eed61d602272a2a58c5f0014332bdcb5ac24acef08e4
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --leader-election=true
    State:          Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Wed, 23 Jun 2021 18:16:32 +0200
      Finished:     Wed, 23 Jun 2021 19:47:52 +0200
    Last State:     Terminated
      Exit Code:    0
      Started:      Mon, 01 Jan 0001 00:00:00 +0000
      Finished:     Mon, 01 Jan 0001 00:00:00 +0000
    Ready:          False
    Restart Count:  39
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-swxlg (ro)
  csi-provisioner:
    Container ID:  docker://cd7846da1eb6d2685b6590f32973f502fd09db757e68ae0ddd6070dce235a827
    Image:         k8s.gcr.io/sig-storage/csi-provisioner:v2.1.0
    Image ID:      docker-pullable://k8s.gcr.io/sig-storage/csi-provisioner@sha256:20c828075d1e36f679d6a91e905b0927141eef5e15be0c9a1ca4a6a0ed9313d2
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=5
      --feature-gates=Topology=true
      --strict-topology
      --leader-election
      --extra-create-metadata=true
      --default-fstype=ext4
    State:          Running
      Started:      Wed, 23 Jun 2021 19:45:33 +0200
    Last State:     Terminated
      Reason:       Error
      Message:      Lost connection to CSI driver, exiting
      Exit Code:    255
      Started:      Wed, 23 Jun 2021 18:14:39 +0200
      Finished:     Wed, 23 Jun 2021 19:39:48 +0200
    Ready:          True
    Restart Count:  86
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-swxlg (ro)
  openebs-zfs-plugin:
    Container ID:  docker://fc35c209505d0010fa10aaae72ffc8a4fac32c6621d1d7a75e2d35b342b6c756
    Image:         openebs/zfs-driver:ci
    Image ID:      docker-pullable://openebs/zfs-driver@sha256:5ec547d790d226a44e1a3ddf4de5c47ef3e20772f2b58aba5dfdebbf61f29448
    Port:          <none>
    Host Port:     <none>
    Args:
      --endpoint=$(OPENEBS_CSI_ENDPOINT)
      --plugin=$(OPENEBS_CONTROLLER_DRIVER)
    State:          Waiting
      Reason:       CreateContainerError
    Last State:     Terminated
      Exit Code:    0
      Started:      Mon, 01 Jan 0001 00:00:00 +0000
      Finished:     Mon, 01 Jan 0001 00:00:00 +0000
    Ready:          False
    Restart Count:  8
    Environment:
      OPENEBS_CONTROLLER_DRIVER:    controller
      OPENEBS_CSI_ENDPOINT:         unix:///var/lib/csi/sockets/pluginproxy/csi.sock
      OPENEBS_NAMESPACE:            openebs
      OPENEBS_IO_INSTALLER_TYPE:    zfs-operator
      OPENEBS_IO_ENABLE_ANALYTICS:  true
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-swxlg (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  socket-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-swxlg:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason      Age                    From     Message
  ----     ------      ----                   ----     -------
  Normal   Pulled      13m                    kubelet  Successfully pulled image "k8s.gcr.io/sig-storage/csi-resizer:v1.1.0" in 2.232429293s
  Warning  Failed      11m                    kubelet  Error: context deadline exceeded
  Normal   Pulling     11m (x11 over 4h11m)   kubelet  Pulling image "k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0"
  Normal   Pulled      11m                    kubelet  Successfully pulled image "k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0" in 1.331620968s
  Normal   Created     9m8s (x11 over 4h10m)  kubelet  Created container csi-snapshotter
  Normal   Started     9m6s (x11 over 4h10m)  kubelet  Started container csi-snapshotter
  Normal   Pulling     9m6s (x17 over 4h9m)   kubelet  Pulling image "k8s.gcr.io/sig-storage/csi-provisioner:v2.1.0"
  Normal   Pulled      9m5s                   kubelet  Successfully pulled image "k8s.gcr.io/sig-storage/csi-provisioner:v2.1.0" in 1.32603379s
  Normal   Created     7m37s (x17 over 4h8m)  kubelet  Created container csi-provisioner
  Normal   Started     7m34s (x17 over 4h8m)  kubelet  Started container csi-provisioner
  Normal   Pulling     7m34s (x2 over 4h8m)   kubelet  Pulling image "openebs/zfs-driver:ci"
  Normal   Pulled      7m33s                  kubelet  Successfully pulled image "openebs/zfs-driver:ci" in 1.540520267s
  Warning  Failed      5m33s                  kubelet  Error: context deadline exceeded
  Warning  FailedSync  5m10s (x3 over 5m32s)  kubelet  error determining status: rpc error: code = Unknown desc = Error: No such container: fc35c209505d0010fa10aaae72ffc8a4fac32c6621d1d7a75e2d35b342b6c756
  Normal   Pulling     5m1s (x12 over 4h13m)  kubelet  Pulling image "k8s.gcr.io/sig-storage/csi-resizer:v1.1.0"
  Normal   Pulled      5m                     kubelet  Successfully pulled image "k8s.gcr.io/sig-storage/csi-resizer:v1.1.0" in 1.051133201s


EDIT:
sent you debug file captured during the error
 
Last edited:

waqarahmed

iXsystems
iXsystems
Joined
Aug 28, 2019
Messages
136
@impovich i just tried to reproduce this on 21.06 BETA but was not able to do so. Can you please create a ticket at https://jira.ixsystems.com ? I am not sure what might be happening otherwise. In any case, i would just like to make sure that it's not something on our end and then we can raise this upstream to openebs if that is indeed the case :) Thank you
 

marmoset

Dabbler
Joined
Dec 18, 2020
Messages
27
Just wanted to add a +1, I only noticed it because of syslog being regularly processed/watched.

Code:
root@truenas:~# k3s kubectl get pods --all-namespaces
NAMESPACE     NAME                       READY   STATUS    RESTARTS   AGE
kube-system   coredns-7448499f4d-vhjjr   1/1     Running   0          24h
kube-system   openebs-zfs-node-7d6h7     2/2     Running   0          24h
kube-system   openebs-zfs-controller-0   5/5     Running   103        24h
root@truenas:~#


Code:
root@truenas:~# grep "back-off 10s restarting failed container=csi-provisioner pod=openebs-zfs-controller-0_kube-system" /var/log/daemon.log
Jun 23 01:46:12 truenas k3s[15440]: E0623 01:46:12.354962   15440 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"csi-provisioner\" with CrashLoopBackOff: \"back-off 10s restarting failed container=csi-provisioner pod=openebs-zfs-controller-0_kube-system(d6b0ce44-0de8-4942-a8f6-e3675c1c0718)\"" pod="kube-system/openebs-zfs-controller-0" podUID=d6b0ce44-0de8-4942-a8f6-e3675c1c0718
Jun 23 02:03:39 truenas k3s[15440]: E0623 02:03:39.926537   15440 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"csi-provisioner\" with CrashLoopBackOff: \"back-off 10s restarting failed container=csi-provisioner pod=openebs-zfs-controller-0_kube-system(d6b0ce44-0de8-4942-a8f6-e3675c1c0718)\"" pod="kube-system/openebs-zfs-controller-0" podUID=d6b0ce44-0de8-4942-a8f6-e3675c1c0718
Jun 23 02:03:40 truenas k3s[15440]: E0623 02:03:40.983152   15440 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"csi-provisioner\" with CrashLoopBackOff: \"back-off 10s restarting failed container=csi-provisioner pod=openebs-zfs-controller-0_kube-system(d6b0ce44-0de8-4942-a8f6-e3675c1c0718)\"" pod="kube-system/openebs-zfs-controller-0" podUID=d6b0ce44-0de8-4942-a8f6-e3675c1c0718
Jun 23 02:03:42 truenas k3s[15440]: E0623 02:03:42.033535   15440 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"csi-provisioner\" with CrashLoopBackOff: \"back-off 10s restarting failed container=csi-provisioner pod=openebs-zfs-controller-0_kube-system(d6b0ce44-0de8-4942-a8f6-e3675c1c0718)\"" pod="kube-system/openebs-zfs-controller-0" podUID=d6b0ce44-0de8-4942-a8f6-e3675c1c0718
Jun 23 02:58:43 truenas k3s[15440]: E0623 02:58:43.255780   15440 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"csi-provisioner\" with CrashLoopBackOff: \"back-off 10s restarting failed container=csi-provisioner pod=openebs-zfs-controller-0_kube-system(d6b0ce44-0de8-4942-a8f6-e3675c1c0718)\"" pod="kube-system/openebs-zfs-controller-0" podUID=d6b0ce44-0de8-4942-a8f6-e3675c1c0718
Jun 23 03:33:27 truenas k3s[15440]: E0623 03:33:27.271018   15440 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"csi-provisioner\" with CrashLoopBackOff: \"back-off 10s restarting failed container=csi-provisioner pod=openebs-zfs-controller-0_kube-system(d6b0ce44-0de8-4942-a8f6-e3675c1c0718)\"" pod="kube-system/openebs-zfs-controller-0" podUID=d6b0ce44-0de8-4942-a8f6-e3675c1c0718
Jun 23 03:33:28 truenas k3s[15440]: E0623 03:33:28.325649   15440 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"csi-provisioner\" with CrashLoopBackOff: \"back-off 10s restarting failed container=csi-provisioner pod=openebs-zfs-controller-0_kube-system(d6b0ce44-0de8-4942-a8f6-e3675c1c0718)\"" pod="kube-system/openebs-zfs-controller-0" podUID=d6b0ce44-0de8-4942-a8f6-e3675c1c0718
Jun 23 08:34:01 truenas k3s[15440]: E0623 08:34:01.431771   15440 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"csi-provisioner\" with CrashLoopBackOff: \"back-off 10s restarting failed container=csi-provisioner pod=openebs-zfs-controller-0_kube-system(d6b0ce44-0de8-4942-a8f6-e3675c1c0718)\"" pod="kube-system/openebs-zfs-controller-0" podUID=d6b0ce44-0de8-4942-a8f6-e3675c1c0718
Jun 23 12:58:27 truenas k3s[15440]: E0623 12:58:27.807616   15440 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"csi-provisioner\" with CrashLoopBackOff: \"back-off 10s restarting failed container=csi-provisioner pod=openebs-zfs-controller-0_kube-system(d6b0ce44-0de8-4942-a8f6-e3675c1c0718)\"" pod="kube-system/openebs-zfs-controller-0" podUID=d6b0ce44-0de8-4942-a8f6-e3675c1c0718
Jun 23 12:58:28 truenas k3s[15440]: E0623 12:58:28.870669   15440 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"csi-provisioner\" with CrashLoopBackOff: \"back-off 10s restarting failed container=csi-provisioner pod=openebs-zfs-controller-0_kube-system(d6b0ce44-0de8-4942-a8f6-e3675c1c0718)\"" pod="kube-system/openebs-zfs-controller-0" podUID=d6b0ce44-0de8-4942-a8f6-e3675c1c0718
root@truenas:~#
 

impovich

Explorer
Joined
May 12, 2021
Messages
72
Can confirm now that there is no need to deploy something, it is crashing on the regular basis on its own, which is good I think( i mean less dependencies on how to reproduce, just install truenas :) )
 

ksimm1

Dabbler
Joined
Dec 7, 2020
Messages
42
I'm seeing this on my system as well, but I don't know what effect it is having. All of my apps are otherwise functioning normally from what I can tell. Happy to provide debug if needed.

(and yes I'm using a supermicro board but not sure how that's related)
 

impovich

Explorer
Joined
May 12, 2021
Messages
72
@marmoset @ksimm1 I attached my logs here as well, could you compare them with yours and see if they are the same?

P.S. X11SCL-F this is my mobo
 

Attachments

  • coredns.txt
    38 KB · Views: 244
  • csi-provisioner.txt
    1.4 KB · Views: 260
  • csi-resizer.txt
    80.8 KB · Views: 190
  • csi-snapshotter.txt
    59.1 KB · Views: 207
  • snapshot-controller.txt
    99.8 KB · Views: 354

guyp2k

Dabbler
Joined
Nov 16, 2020
Messages
26
+1 Threadripper 2970WX, ASUS MB, and running the latest nightly.

kube-system openebs-zfs-controller-0 4/5 CrashLoopBackOff 486 35h
kube-system nvidia-device-plugin-daemonset-d55ks 0/1 CrashLoopBackOff 474 35h

Not going open a jira given the issue is already open, just a FYI....
 

proligde

Dabbler
Joined
Jan 29, 2014
Messages
21
Same here with Ryzen 5 3600 on ASRockRack X570D4U-2L2T

Everything seems to work, but the pod reports 1364 restarts in the last 6 days.
 

sjieke

Contributor
Joined
Jun 7, 2011
Messages
125
Apps are working fine for me, since I'm using hostPath, but the 'openebs-zfs-controller-0' has 4020 restarts in the last 14 days
Code:
NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
kube-system   coredns-7448499f4d-cx6qc          1/1     Running   0          14d
kube-system   openebs-zfs-node-mz247            2/2     Running   0          14d
ix-traefik    svclb-traefik-4cg4x               3/3     Running   0          14d
ix-pihole     pihole-ix-chart-d4779bb67-m55pv   1/1     Running   0          14d
ix-unifi      svclb-unifi-stun-774fr            1/1     Running   0          4d2h
ix-unifi      svclb-unifi-comm-spd2x            1/1     Running   0          4d2h
ix-traefik    traefik-5f96cfb7c6-mr58s          1/1     Running   0          64m
ix-unifi      unifi-6d89458d74-glmnp            1/1     Running   0          62m
kube-system   openebs-zfs-controller-0          5/5     Running   4020       14d
 

MadMungo

Dabbler
Joined
Jul 18, 2015
Messages
13
I am seeing the same/similar as @sjieke. Not as many errors and Apps are working fine. Threadripper 1950X on an MSI X399 board.

Code:
MungoScale# k3s kubectl get pods --all-namespaces
NAMESPACE         NAME                                   READY   STATUS    RESTARTS   AGE
kube-system       nvidia-device-plugin-daemonset-rq6fp   1/1     Running   1          3d22h
kube-system       coredns-7448499f4d-2q87d               1/1     Running   5          11d
kube-system       openebs-zfs-node-28lzx                 2/2     Running   10         11d
kube-system       openebs-zfs-controller-0               5/5     Running   841        11d
ix-plex           plex-ix-chart-6785fd4cd8-j6wdn         1/1     Running   0          2d6h
ix-flaresolverr   flaresolverr-df754fc54-tszp8           1/1     Running   0          2d6h
ix-readarr        readarr-569949d44b-rx2q5               1/1     Running   0          2d6h
ix-delugevpn      delugevpn-ix-chart-db4559dfc-m2vdw     1/1     Running   0          2d6h
ix-medusa         medusa-ix-chart-5b9b8c55d5-f27gv       1/1     Running   0          30h
ix-lidarr         lidarr-7bb6d4878-7lcqs                 1/1     Running   0          22h
ix-jackett        jackett-ix-chart-9d578d965-42tz4       1/1     Running   0          22h
 

ornias

Wizard
Joined
Mar 6, 2020
Messages
1,458
It's a known issue and being worked on.
Nothing to fuss about at the moment.
 
Joined
Jul 29, 2022
Messages
1
Same issue, Gigabyte motherboard
 

Attachments

  • Shell - 192.168.0.11 and 22 more pages - Personal - Microsoft​ Edge 29_07_2022 18_29_49.png
    Shell - 192.168.0.11 and 22 more pages - Personal - Microsoft​ Edge 29_07_2022 18_29_49.png
    847.3 KB · Views: 191
Top