Updates API migration requirements
Pinned
Activity
jingxu97 push jingxu97/enhancements
commit sha: 6a4aadc1a4aa6cbf931fdb91f52f6ff72f436a1d
push time in 4 days agojingxu97 issue comment kubernetes/enhancements
Support recovery from volume expansion failure
Enhancement Description
- One-line enhancement description (can be used as a release note): Allow users to recover from volume expansion failure
- Kubernetes Enhancement Proposal: (link to kubernetes/enhancements file, if none yet, link to PR): https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1790-recover-resize-failure/README.md
- Primary contact (assignee): @gnufied
- Responsible SIGs: sig-storage
- Enhancement target (which target equals to which milestone):
- Alpha release target (x.y): 1.23
- Beta release target (x.y): 1.24
- Stable release target (x.y): 1.26
/assigne @gnufied
jingxu97 push jingxu97/kubernetes
commit sha: 9f460160c1d6d199f75453e1ae529c230e8a6b1f
push time in 6 days agojingxu97 issue comment kubernetes/website
Add non-graceful node shutdown blog article
~Place holder blog PR for https://github.com/kubernetes/enhancements/issues/2268~ Add blog article for https://github.com/kubernetes/enhancements/issues/2268
jingxu97 issue comment kubernetes/website
Add non-graceful node shutdown blog article
~Place holder blog PR for https://github.com/kubernetes/enhancements/issues/2268~ Add blog article for https://github.com/kubernetes/enhancements/issues/2268
Is this ready to be merged?
jingxu97 issue comment kubernetes/kubernetes
gce: KCM detaches all in-tree volumes during update from K8s 1.20 to 1.21
What happened?
KCM wrongly detaches all in-tree volumes during update from K8s 1.20 to 1.21.
What did you expect to happen?
KCM to do not detach all in-tree volumes during update from K8s 1.20 to 1.21.
How can we reproduce it (as minimally and precisely as possible)?
-
Create a K8s 1.20.13 cluster.
-
Create in-tree and out-of-tree StorageClasses.
allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: default-intree parameters: type: pd-standard provisioner: kubernetes.io/gce-pd reclaimPolicy: Delete volumeBindingMode: Immediate
allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "true" name: default parameters: type: pd-standard provisioner: pd.csi.storage.gke.io reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer
-
Create 3 StatefulSets with 4 replicas (1 Stateful set is using the out-of-tree, the other 2 - the in-tree):
apiVersion: v1 kind: Service metadata: name: app1 labels: app: app1 spec: ports: - port: 80 name: web clusterIP: None selector: app: app1 --- apiVersion: apps/v1 kind: StatefulSet metadata: name: app1 spec: serviceName: app1 replicas: 4 selector: matchLabels: app: app1 template: metadata: labels: app: app1 spec: containers: - name: app1 image: centos command: ["/bin/sh"] args: ["-c", "while true; do echo $HOSTNAME $(date -u) >> /data/out.txt; sleep 5; done"] volumeMounts: - name: persistent-storage-app1 mountPath: /data livenessProbe: exec: command: - tail - -n 1 - /data/out.txt volumeClaimTemplates: - metadata: name: persistent-storage-app1 spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 1Gi
apiVersion: v1 kind: Service metadata: name: app2 labels: app: app2 spec: ports: - port: 80 name: web clusterIP: None selector: app: app2 --- apiVersion: apps/v1 kind: StatefulSet metadata: name: app2 spec: serviceName: app2 replicas: 4 selector: matchLabels: app: app2 template: metadata: labels: app: app2 spec: containers: - name: app2 image: centos command: ["/bin/sh"] args: ["-c", "while true; do echo $HOSTNAME $(date -u) >> /data/out.txt; sleep 5; done"] volumeMounts: - name: persistent-storage-app2 mountPath: /data livenessProbe: exec: command: - tail - -n 1 - /data/out.txt volumeClaimTemplates: - metadata: name: persistent-storage-app2 spec: storageClassName: default-intree accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 2Gi
apiVersion: v1 kind: Service metadata: name: app3 labels: app: app3 spec: ports: - port: 80 name: web clusterIP: None selector: app: app3 --- apiVersion: apps/v1 kind: StatefulSet metadata: name: app3 spec: serviceName: app3 replicas: 4 selector: matchLabels: app: app3 template: metadata: labels: app: app3 spec: containers: - name: app3 image: centos command: ["/bin/sh"] args: ["-c", "while true; do echo $HOSTNAME $(date -u) >> /data/out.txt; sleep 5; done"] volumeMounts: - name: persistent-storage-app3 mountPath: /data livenessProbe: exec: command: - tail - -n 1 - /data/out.txt volumeClaimTemplates: - metadata: name: persistent-storage-app3 spec: storageClassName: default-intree accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 3Gi
-
Update the cluster to K8s 1.21.10.
-
Make sure that kube-controller-manager detaches all in-tree volumes during update
5.1 kube-controller-manager marks all in-tree volumes as uncertain.
2022-04-06 06:33:00 {"log":"Marking volume attachment as uncertain as volume:\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/europe-west1-b/disks/pv--589d5a8a-cf3a-4428-bd5f-4e03d1615e1e\" (\"cpu-worker-etcd-z1-86c78-7nqlq\") is not attached (Detached)","pid":"1","severity":"INFO","source":"attach_detach_controller.go:769"}
2022-04-06 06:33:00 {"log":"Marking volume attachment as uncertain as volume:\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/europe-west1-b/disks/pv--7e6663f2-2446-45ee-bca3-a53771b7226b\" (\"cpu-worker-etcd-z1-86c78-7nqlq\") is not attached (Detached)","pid":"1","severity":"INFO","source":"attach_detach_controller.go:769"}
2022-04-06 06:33:00 {"log":"Marking volume attachment as uncertain as volume:\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/europe-west1-b/disks/pv--6b11de8e-2115-42d6-8d26-c52bf97a1076\" (\"cpu-worker-etcd-z1-86c78-7nqlq\") is not attached (Detached)","pid":"1","severity":"INFO","source":"attach_detach_controller.go:769"}
2022-04-06 06:33:00 {"log":"Marking volume attachment as uncertain as volume:\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/europe-west1-b/disks/pv--148e95ac-196f-4b59-a547-c01ab7ae3f2d\" (\"cpu-worker-etcd-z1-86c78-7nqlq\") is not attached (Detached)","pid":"1","severity":"INFO","source":"attach_detach_controller.go:769"}
2022-04-06 06:33:00 {"log":"Marking volume attachment as uncertain as volume:\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/europe-west1-b/disks/pv--4f4d8963-b336-41c8-81ea-d1bbf25b61b9\" (\"cpu-worker-etcd-z1-86c78-7nqlq\") is not attached (Detached)","pid":"1","severity":"INFO","source":"attach_detach_controller.go:769"}
2022-04-06 06:33:01 {"log":"Marking volume attachment as uncertain as volume:\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/europe-west1-b/disks/pv--ec3c3c19-5a62-4b29-b263-713da5a07d6e\" (\"cpu-worker-etcd-z1-86c78-7nqlq\") is not attached (Detached)","pid":"1","severity":"INFO","source":"attach_detach_controller.go:769"}
2022-04-06 06:33:01 {"log":"Marking volume attachment as uncertain as volume:\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/europe-west1-b/disks/pv--ad094298-f42a-4f18-b453-9772ce21386b\" (\"cpu-worker-etcd-z1-86c78-7nqlq\") is not attached (Detached)","pid":"1","severity":"INFO","source":"attach_detach_controller.go:769"}
2022-04-06 06:33:01 {"log":"Marking volume attachment as uncertain as volume:\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/europe-west1-b/disks/pv--53b113af-aaf2-4306-9c6f-817ca0de62eb\" (\"cpu-worker-etcd-z1-86c78-7nqlq\") is not attached (Detached)","pid":"1","severity":"INFO","source":"attach_detach_controller.go:769"}
2022-04-06 06:33:01 {"log":"Marking volume attachment as uncertain as volume:\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/europe-west1-b/disks/pv--9a237184-7f03-451f-aad5-c85b09d6d580\" (\"cpu-worker-etcd-z1-86c78-7nqlq\") is not attached (Detached)","pid":"1","severity":"INFO","source":"attach_detach_controller.go:769"}
2022-04-06 06:33:01 {"log":"Marking volume attachment as uncertain as volume:\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/europe-west1-b/disks/pv--26b788f6-03e5-4b39-a5f2-c20ef9a8a884\" (\"cpu-worker-etcd-z1-86c78-7nqlq\") is not attached (Detached)","pid":"1","severity":"INFO","source":"attach_detach_controller.go:769"}
5.2 6min after marking a lot of volume attachments as uncertain, KCM detaches the in-tree volumes.
2022-04-06 06:39:08 {"log":"attacherDetacher.DetachVolume started for volume \"nil\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--2bc75fc8-af96-4327-a9a7-cd648c41ec96\") on node \"cpu-worker-etcd-z1-86c78-7nqlq\" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching","pid":"1","severity":"WARN","source":"reconciler.go:224"}
2022-04-06 06:39:08 {"log":"attacherDetacher.DetachVolume started for volume \"nil\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--148e95ac-196f-4b59-a547-c01ab7ae3f2d\") on node \"cpu-worker-etcd-z1-86c78-7nqlq\" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching","pid":"1","severity":"WARN","source":"reconciler.go:224"}
2022-04-06 06:39:08 {"log":"attacherDetacher.DetachVolume started for volume \"nil\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--8ad228d7-187b-4972-ab4c-74e5a46c6ad8\") on node \"cpu-worker-etcd-z1-86c78-7nqlq\" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching","pid":"1","severity":"WARN","source":"reconciler.go:224"}
2022-04-06 06:39:08 {"log":"attacherDetacher.DetachVolume started for volume \"nil\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--26b788f6-03e5-4b39-a5f2-c20ef9a8a884\") on node \"cpu-worker-etcd-z1-86c78-7nqlq\" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching","pid":"1","severity":"WARN","source":"reconciler.go:224"}
2022-04-06 06:39:08 {"log":"attacherDetacher.DetachVolume started for volume \"nil\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--53b113af-aaf2-4306-9c6f-817ca0de62eb\") on node \"cpu-worker-etcd-z1-86c78-7nqlq\" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching","pid":"1","severity":"WARN","source":"reconciler.go:224"}
2022-04-06 06:39:09 {"log":"attacherDetacher.DetachVolume started for volume \"nil\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--4f4d8963-b336-41c8-81ea-d1bbf25b61b9\") on node \"cpu-worker-etcd-z1-86c78-7nqlq\" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching","pid":"1","severity":"WARN","source":"reconciler.go:224"}
2022-04-06 06:39:09 {"log":"attacherDetacher.DetachVolume started for volume \"nil\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--6b11de8e-2115-42d6-8d26-c52bf97a1076\") on node \"cpu-worker-etcd-z1-86c78-7nqlq\" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching","pid":"1","severity":"WARN","source":"reconciler.go:224"}
2022-04-06 06:39:09 {"log":"attacherDetacher.DetachVolume started for volume \"nil\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--7e6663f2-2446-45ee-bca3-a53771b7226b\") on node \"cpu-worker-etcd-z1-86c78-7nqlq\" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching","pid":"1","severity":"WARN","source":"reconciler.go:224"}
2022-04-06 06:39:09 {"log":"attacherDetacher.DetachVolume started for volume \"nil\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--9a237184-7f03-451f-aad5-c85b09d6d580\") on node \"cpu-worker-etcd-z1-86c78-7nqlq\" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching","pid":"1","severity":"WARN","source":"reconciler.go:224"}
2022-04-06 06:39:09 {"log":"attacherDetacher.DetachVolume started for volume \"nil\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--ad094298-f42a-4f18-b453-9772ce21386b\") on node \"cpu-worker-etcd-z1-86c78-7nqlq\" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching","pid":"1","severity":"WARN","source":"reconciler.go:224"}
2022-04-06 06:39:09 {"log":"attacherDetacher.DetachVolume started for volume \"nil\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--ec3c3c19-5a62-4b29-b263-713da5a07d6e\") on node \"cpu-worker-etcd-z1-86c78-7nqlq\" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching","pid":"1","severity":"WARN","source":"reconciler.go:224"}
2022-04-06 06:39:09 {"log":"attacherDetacher.DetachVolume started for volume \"nil\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--589d5a8a-cf3a-4428-bd5f-4e03d1615e1e\") on node \"cpu-worker-etcd-z1-86c78-7nqlq\" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching","pid":"1","severity":"WARN","source":"reconciler.go:224"}
- After these detachments, the Node turns into unhealthy state with reason
FilesystemIsReadOnly
.
Normal FilesystemIsReadOnly 48m kernel-monitor Node condition ReadonlyFilesystem is now: True, reason: FilesystemIsReadOnly
The corresponding Pods fail with IO errors as their volumes are detached.
Anything else we need to know?
I see that in ASW the volume is reported as attached
I0407 06:37:53.956930 1 actual_state_of_world.go:507] Report volume "kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--44ee54d8-42d9-4ff6-8092-4a20d932c941" as attached to node "worker-1-z1-7b85b-9r2tb"
Then the VA is marked as uncertain
I0407 06:37:53.957975 1 attach_detach_controller.go:769] Marking volume attachment as uncertain as volume:"kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/europe-west1-b/disks/pv--44ee54d8-42d9-4ff6-8092-4a20d932c941" ("worker-1-z1-7b85b-9r2tb") is not attached (Detached)
Note the diff in the volume name
-kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--44ee54d8-42d9-4ff6-8092-4a20d932c941
+kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/europe-west1-b/disks/pv--44ee54d8-42d9-4ff6-8092-4a20d932c941
Kubernetes version
Update from K8s 1.20.13 to 1.21.10.
$ kubectl version
# paste output here
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
external-provisioner - k8s.gcr.io/sig-storage/[email protected]
external-attacher - k8s.gcr.io/sig-storage/[email protected]
gcp-compute-persistent-disk-csi-driver - gcr.io/gke-release/[email protected]
@ialidzhikov thanks for sharing the detailed information. About label, I confirmed with @msau42, it is ok to not reapply it since it is mainly informational. Controller or scheduler does not depends on the zone label to make decisions. Also as we discussed, in option 5, you can choose not to downgrade external provisioner as long as you make sure no new PVs are created during upgrade
jingxu97 pull request kubernetes/kubernetes
WIP: fix issue #109354
Change-Id: I774a429442327a8700db975c625dd681f8577bc8
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
jingxu97 push jingxu97/kubernetes
commit sha: b53be1d66ef1c7f79410d619864d3788e084dd49
push time in 1 week agojingxu97 push jingxu97/kubernetes
commit sha: 564b2049231c971b6e2e51c0822ecbf030da4094
push time in 1 week agojingxu97 issue kubernetes/kubernetes
Failing test: Feature gate checking is not enabled
Failure cluster 16ae3f774c3255733716
Error text:
test/e2e/common/node/downwardapi.go:295
May 4 11:36:57.292: Feature gate checking is not enabled, don't use SkipUnlessFeatureGateEnabled(DownwardAPIHugePages). Instead use the Feature tag.
test/e2e/common/node/downwardapi.go:296
Recent failures:
2022/5/7 10:22:24 ci-kubernetes-e2e-gci-gce-serial-kube-dns 2022/5/7 08:45:24 ci-kubernetes-e2e-gci-gce-serial-kube-dns-nodecache 2022/5/7 06:31:10 ci-cri-containerd-e2e-cos-gce-serial 2022/5/7 05:57:09 ci-kubernetes-e2e-gci-gce-serial 2022/5/7 04:22:09 ci-kubernetes-e2e-gci-gce-serial-kube-dns
/kind failing-test
/sig node
jingxu97 issue comment kubernetes/kubernetes
Failing test: Feature gate checking is not enabled
Failure cluster 16ae3f774c3255733716
Error text:
test/e2e/common/node/downwardapi.go:295
May 4 11:36:57.292: Feature gate checking is not enabled, don't use SkipUnlessFeatureGateEnabled(DownwardAPIHugePages). Instead use the Feature tag.
test/e2e/common/node/downwardapi.go:296
Recent failures:
2022/5/7 10:22:24 ci-kubernetes-e2e-gci-gce-serial-kube-dns 2022/5/7 08:45:24 ci-kubernetes-e2e-gci-gce-serial-kube-dns-nodecache 2022/5/7 06:31:10 ci-cri-containerd-e2e-cos-gce-serial 2022/5/7 05:57:09 ci-kubernetes-e2e-gci-gce-serial 2022/5/7 04:22:09 ci-kubernetes-e2e-gci-gce-serial-kube-dns
/kind failing-test
/sig node
After https://github.com/kubernetes/kubernetes/pull/109852, the tests are still failing. @pacoxu Could you help check it? https://testgrid.k8s.io/sig-storage-kubernetes#gce-serial
jingxu97 issue kubernetes/kubernetes
AddVolumeToReportAsAttached logic issue
What happened?
If a volume is marked as uncertain, the volume should not be on the volumeAttached list. But if detach is triggered, but failed, the attach_detach_controller will try to add this volume as attached in node status (AddVolumeToReportAsAttached). However, the volume might be not even attached (since the state is uncertain). So in this case, if detach failed, we should mark volume as uncertain.
What did you expect to happen?
If the volume is marked as uncertain, it should not be listed as volumeAttached in node status
How can we reproduce it (as minimally and precisely as possible)?
Have a customized driver to make attach operation timeout, volume will be marked as uncertain in actual state. Delete pod which is using this volume, and make detach operation fail.
Anything else we need to know?
No response
Kubernetes version
$ kubectl version
# paste output here
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
jingxu97 wants to merge kubernetes/kubernetes
Skip mount point checks when possible during mount cleanup.
What type of PR is this?
/kind bug /kind api-change
What this PR does / why we need it:
This is a continuation of https://github.com/kubernetes/kubernetes/pull/109117. Please see that PR for more background.
Calls to mounter.Unmount
are preceded and followed by expensive mount point checks. These checks are not necessary on *nix-s with a umount
implementation that performs a similar check itself. This PR adds a mechanism to detect the "safe" behavior, and avoid mount point checks when possible.
This change represents a significant optimization of CleanupMountPoint
; enabling use-cases where pods have many mounts, and there is high pod churn. We (EKS) have observed several cases of instability and poor node health in such scenarios, which were resolved with this change.
Which issue(s) this PR fixes:
No issue available.
Special notes for your reviewer:
I chose to add a function to the Mounter
interface in order to keep the CleanupMountPoint
implementation generic for Unix and Windows. If the "safe" umount
behavior is not detected, the existing code paths are unchanged.
Does this PR introduce a user-facing change?
A function (CanSafelySkipMountPointCheck() bool) was added to mount-utils Mounter interface, exposing the mounter's support for skipping mount point checks.
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
if mountPath is not a mount point, will it return error at line 104 during Unmount? In this case, it will not call removePath in the following, right?
jingxu97 merge to kubernetes/kubernetes
Skip mount point checks when possible during mount cleanup.
What type of PR is this?
/kind bug /kind api-change
What this PR does / why we need it:
This is a continuation of https://github.com/kubernetes/kubernetes/pull/109117. Please see that PR for more background.
Calls to mounter.Unmount
are preceded and followed by expensive mount point checks. These checks are not necessary on *nix-s with a umount
implementation that performs a similar check itself. This PR adds a mechanism to detect the "safe" behavior, and avoid mount point checks when possible.
This change represents a significant optimization of CleanupMountPoint
; enabling use-cases where pods have many mounts, and there is high pod churn. We (EKS) have observed several cases of instability and poor node health in such scenarios, which were resolved with this change.
Which issue(s) this PR fixes:
No issue available.
Special notes for your reviewer:
I chose to add a function to the Mounter
interface in order to keep the CleanupMountPoint
implementation generic for Unix and Windows. If the "safe" umount
behavior is not detected, the existing code paths are unchanged.
Does this PR introduce a user-facing change?
A function (CanSafelySkipMountPointCheck() bool) was added to mount-utils Mounter interface, exposing the mounter's support for skipping mount point checks.
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
jingxu97 issue comment kubernetes/kubernetes
Failing test: Feature gate checking is not enabled
Failure cluster 16ae3f774c3255733716
Error text:
test/e2e/common/node/downwardapi.go:295
May 4 11:36:57.292: Feature gate checking is not enabled, don't use SkipUnlessFeatureGateEnabled(DownwardAPIHugePages). Instead use the Feature tag.
test/e2e/common/node/downwardapi.go:296
Recent failures:
2022/5/7 10:22:24 ci-kubernetes-e2e-gci-gce-serial-kube-dns 2022/5/7 08:45:24 ci-kubernetes-e2e-gci-gce-serial-kube-dns-nodecache 2022/5/7 06:31:10 ci-cri-containerd-e2e-cos-gce-serial 2022/5/7 05:57:09 ci-kubernetes-e2e-gci-gce-serial 2022/5/7 04:22:09 ci-kubernetes-e2e-gci-gce-serial-kube-dns
/kind failing-test
/sig node
I think this is related to the change https://github.com/kubernetes/kubernetes/pull/109649 @pohly](https://github.com/pohly)
jingxu97 issue comment kubernetes/kubernetes
If kubelet is unavailable, AttachDetachController fails to force detach on pod deletion
/kind bug
What you expected to happen: When a pod using an attached volume is deleted (gracefully) but kubelet in the corresponding node is down, the AttachDetachController should assume the node is unrecoverable after a timeout (currently 6min) and forcefully detach the volume.
What happened: The volume is never detached.
How to reproduce it (as minimally and precisely as possible):
- Create a pod with a volume using any of the attachable plugins (I used GCE PD to test).
- Stop kubelet inside the node where the pod is scheduled.
- Delete the pod.
- Wait for 6min+
- Check to see if the volume is still attached.
Anything else we need to know?: This doesn't happen if the pod is force-deleted.
It's likely due to the last condition checked in this line. Once kubelet is down, the container status is no longer reported correctly. Inside the Attach Detach Controller, This function is called by the pod update informer handler, which sets whether the volume should be attached in the desired state of the world. On pod force deletion, the pod object is deleted immediately, and this triggers the pod delete informer handler, which doesn't call this function.
/sig storage /cc @saad-ali @gnufied @jingxu97 @NickrenREN /assign
We now have a new alpha feature "Nongraceful node shutdown" which can mostly help this situation. Please check out the KEP for details https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown There will be a blog post soon. Please let us know if you have any feedback.
jingxu97 issue comment kubernetes/kubernetes
[Failing test] pull-kubernetes-e2e-gce-iscsi-serial and pull-kubernetes-e2e-gce-iscsi are failing
Which jobs are failing?
pull-kubernetes-e2e-gce-iscsi-serial pull-kubernetes-e2e-gce-iscsi
Error text:
W0106 06:54:08.830] scp: /var/log/cluster-autoscaler.log*: No such file or directory
W0106 06:54:08.830] scp: /var/log/kube-addon-manager.log*: No such file or directory
W0106 06:54:08.908] scp: /var/log/fluentd.log*: No such file or directory
W0106 06:54:08.909] scp: /var/log/kubelet.cov*: No such file or directory
W0106 06:54:08.909] scp: /var/log/startupscript.log*: No such file or directory
W0106 06:54:08.915] ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
I0106 06:54:09.138] Dumping logs from nodes locally to '/workspace/_artifacts'
I0106 06:54:09.139] Detecting nodes in the cluster
I0106 06:55:02.414] Changing logfiles to be world-readable for download
I0106 06:55:02.912] Changing logfiles to be world-readable for download
I0106 06:55:02.913] Changing logfiles to be world-readable for download
skipped 9 lines unfold_more
W0106 06:55:10.794] scp: /var/log/containers/konnectivity-agent-*.log*: No such file or directory
W0106 06:55:10.795] scp: /var/log/fluentd.log*: No such file or directory
W0106 06:55:10.795] scp: /var/log/node-problem-detector.log*: No such file or directory
W0106 06:55:10.795] scp: /var/log/kubelet.cov*: No such file or directory
W0106 06:55:10.795] scp: /var/log/startupscript.log*: No such file or directory
W0106 06:55:10.801] ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
W0106 06:55:11.299] scp: /var/log/containers/konnectivity-agent-*.log*: No such file or directory
W0106 06:55:11.299] scp: /var/log/fluentd.log*: No such file or directory
W0106 06:55:11.300] scp: /var/log/node-problem-detector.log*: No such file or directory
W0106 06:55:11.300] scp: /var/log/kubelet.cov*: No such file or directory
W0106 06:55:11.300] scp: /var/log/startupscript.log*: No such file or directory
W0106 06:55:11.305] ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
W0106 06:55:11.484] scp: /var/log/containers/konnectivity-agent-*.log*: No such file or directory
W0106 06:55:11.485] scp: /var/log/fluentd.log*: No such file or directory
W0106 06:55:11.485] scp: /var/log/node-problem-detector.log*: No such file or directory
W0106 06:55:11.485] scp: /var/log/kubelet.cov*: No such file or directory
W0106 06:55:11.485] scp: /var/log/startupscript.log*: No such file or directory
W0106 06:55:11.488] ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
W0106 06:55:15.713] INSTANCE_GROUPS=e2e-d44c9f5815-8b654-minion-group
W0106 06:55:15.713] NODE_NAMES=e2e-d44c9f5815-8b654-minion-group-dn5x e2e-d44c9f5815-8b654-minion-group-lcfz e2e-d44c9f5815-8b654-minion-group-pl8z
I0106 06:55:17.189] Failures for e2e-d44c9f5815-8b654-minion-group (if any):
W0106 06:55:19.045] 2022/01/06 06:55:19 process.go:155: Step './cluster/log-dump/log-dump.sh /workspace/_artifacts' finished in 1m55.805053979s
W0106 06:55:19.046] 2022/01/06 06:55:19 process.go:153: Running: ./hack/e2e-internal/e2e-down.sh
skipped 66 lines unfold_more
W0106 06:59:26.454] File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2e.py", line 111, in check_env
W0106 06:59:26.454] subprocess.check_call(cmd, env=env)
W0106 06:59:26.454] File "/usr/lib/python2.7/subprocess.py", line 190, in check_call
W0106 06:59:26.454] raise CalledProcessError(retcode, cmd)
W0106 06:59:26.455] subprocess.CalledProcessError: Command '('kubetest', '--dump=/workspace/_artifacts', '--gcp-service-account=/etc/service-account/service-account.json', '--build=quick', '--stage=gs://kubernetes-release-pull/ci/pull-kubernetes-e2e-gce-iscsi-serial', '--up', '--down', '--test', '--provider=gce', '--cluster=e2e-d44c9f5815-8b654', '--gcp-network=e2e-d44c9f5815-8b654', '--extract=local', '--gcp-node-image=ubuntu', '--image-family=ubuntu-2004-lts', '--image-project=ubuntu-os-cloud', '--gcp-zone=us-west1-b', '--test_args=--ginkgo.focus=\\[Driver:.iscsi\\].*(\\[Serial\\]|\\[Disruptive\\]) --ginkgo.skip=\\[Flaky\\] --minStartupPods=8', '--timeout=120m')' returned non-zero exit status 1
E0106 06:59:26.455] Command failed
I0106 06:59:26.455] process 686 exited with code 1 after 34.1m
E0106 06:59:26.455] FAIL: pull-kubernetes-e2e-gce-iscsi-serial
I0106 06:59:26.456] Call: gcloud auth activate-service-account --key-file=/etc/service-account/service-account.json
W0106 06:59:27.131] Activated service account credentials for: [[email protected]]
I0106 06:59:27.248] process 83564 exited with code 0 after 0.0m
I0106 06:59:27.249] Call: gcloud config get-value account
I0106 06:59:27.895] process 83577 exited with code 0 after 0.0m
I0106 06:59:27.895] Will upload results to gs://kubernetes-jenkins/pr-logs using [email protected]
I0106 06:59:27.896] Upload result and artifacts...
I0106 06:59:27.896] Gubernator results at https://gubernator.k8s.io/build/kubernetes-jenkins/pr-logs/pull/104732/pull-kubernetes-e2e-gce-iscsi-serial/1478975577011523584
I0106 06:59:27.896] Call: gsutil ls gs://kubernetes-jenkins/pr-logs/pull/104732/pull-kubernetes-e2e-gce-iscsi-serial/1478975577011523584/artifacts
W0106 06:59:28.914] CommandException: One or more URLs matched no objects.
E0106 06:59:29.117] Command failed
Since when has it been failing?
2021 Dec 09 11:19:02
Testgrid link
https://testgrid.k8s.io/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-gce-iscsi-serial
Reason for failure (if possible)
No response
Anything else we need to know?
No response
Relevant SIG(s)
/sig testing
jingxu97 issue comment kubernetes/kubernetes
Failed to delete pod volume because of directory not empty
What happened:
When then init process of pod is blocking in requesting remote url,I delete the pod.But the init process is still blocking and the umonter is umounting pod volume.At the same time,the process wakes up and writes something to volume,leading to something left in pod volume path(Mine is /var/lib/kubelet/pods/ea23394f-db52-11e8-ad88-6c92bf6f20b2/volumes/hulk~lvm/hulklvm).Then the umounter is trying to rmdir pod volume path.But it failed and print:
nestedpendingoperations.go:262] Operation for "\"hulk/lvm/hulklvm\" (\"ea23394f-db52-11e8-ad88-6c92bf6f20b2\")" failed. No retries permitted until 2018-10-29 16:16:38.785298614 +0800 CST (durationBeforeRetry 500ms). Error: UnmountVolume.TearDown failed for volume "hulk/lvm/hulklvm" (volume.spec.Name: "hulklvm") pod "ea23394f-db52-11e8-ad88-6c92bf6f20b2" (UID: "ea23394f-db52-11e8-ad88-6c92bf6f20b2") with: remove /var/lib/kubelet/pods/ea23394f-db52-11e8-ad88-6c92bf6f20b2/volumes/hulk~lvm/hulklvm: directory not empty
What you expected to happen:
It should remove the volume path even if directory not empty.Means in umounter/TearDownAt, we should use os.RemoveAll at end instead of os.Remove
How to reproduce it (as minimally and precisely as possible):
while creating a pod and pod is stucked ,delete it
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
):1.6.6 - Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):centos 7
- Kernel (e.g.
uname -a
):3.10.0-693.mt20180403.47.el7.x86_64 - Install tools:
- Others:
/kind bug
/remove-lifecycle stale
jingxu97 issue comment kubernetes-sigs/vsphere-csi-driver
CNS volumes disappear and all in cluster operations fail
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened: In 2 separate kubernetes clusters we have observed failures when pods get moved and as a result the PVs have to be attached to the new node. The CNS volume object seems to just disappear, the FCD is still intact on the vsan. Nothing has triggered a deletion as far as I can see...
Events from the kubernetes side:
kubectl get event -n workload
LAST SEEN TYPE REASON OBJECT MESSAGE
26s Normal Scheduled pod/redis-754fbf4bd-26lbf Successfully assigned workload/redis-754fbf4bd-26lbf to worker3
11s Warning FailedAttachVolume pod/redis-754fbf4bd-26lbf AttachVolume.Attach failed for volume "pvc-202662bf-3ce7-40a0-96af-d22d58198dce" : rpc error: code = Internal desc = failed to attach disk: "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" with node: "worker3" err ServerFaultCode: Received SOAP response fault from [<cs p:00007fa4dc0a6290, TCP:localhost:443>]: retrieveVStorageObject
Logs from csi-controller:
2021-09-17 12:27:36
I0917 09:27:36.946521 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:36
I0917 09:27:36.340379 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:35
I0917 09:27:35.969819 1 csi_handler.go:226] Error processing "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba": failed to detach: rpc error: code = Internal desc = volumeID "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" not found in QueryVolume
2021-09-17 12:27:35
I0917 09:27:35.969770 1 csi_handler.go:612] Saved detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
I0917 09:27:35.967871 1 controller.go:158] Ignoring VolumeAttachment "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba" change
2021-09-17 12:27:35
I0917 09:27:35.952349 1 csi_handler.go:601] Saving detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
{"level":"error","time":"2021-09-17T09:27:35.951799411Z","caller":"vanilla/controller.go:883","msg":"volumeID \"eb05ab8b-dcd0-4217-9c34-3ec8bda666a9\" not found in QueryVolume","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:883\nsigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume\n\t/build/pkg/csi/service/vanilla/controller.go:937\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler.func1\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5200\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).controllerUnpublishVolume\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:141\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:88\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:178\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:218\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:177\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi.(*StoragePlugin).injectContext\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware.go:231\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:106\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5202\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1024\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1313\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:722"}
2021-09-17 12:27:35
{"level":"info","time":"2021-09-17T09:27:35.904885335Z","caller":"vanilla/controller.go:857","msg":"ControllerUnpublishVolume: called with args {VolumeId:eb05ab8b-dcd0-4217-9c34-3ec8bda666a9 NodeId:worker3 Secrets:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3"}
2021-09-17 12:27:35
I0917 09:27:35.903737 1 csi_handler.go:715] Found NodeID worker3 in CSINode worker3
And some entries from vsanvcmgmtd:
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Exit vasa.NotificationManager.getAlarm (0 ms)
2021-09-15T22:18:38.683Z info vsanvcmgmtd[07066] [[email protected] sub=AccessChecker] User CLISTER.LOCAL\[email protected] was authenticated with soap session id. 52db629f-f84a-15b2-6401-b41b25af9ec7 (52a55bd9-9ac9-d4bd-d9f0-d6145bb4f7a5)
2021-09-15T22:18:38.704Z verbose vsanvcmgmtd[07065] [[email protected] sub=PyBackedMO opId=07dc8a1b] Enter vim.cns.VolumeManager.queryAll, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.706Z verbose vsanvcmgmtd[07065] [[email protected] sub=PyBackedMO opId=07dc8a1b] Exit vim.cns.VolumeManager.queryAll (1 ms)
2021-09-15T22:18:38.731Z verbose vsanvcmgmtd[07104] [[email protected] sub=PyBackedMO opId=07dc8a1c] Enter vim.cns.VolumeManager.query, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.885Z verbose vsanvcmgmtd[07104] [[email protected] sub=PyBackedMO opId=07dc8a1c] Exit vim.cns.VolumeManager.query (154 ms)
2021-09-15T22:18:38.923Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=07dc8a1d] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: create volume task created: task-314869
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task scheduled.
2021-09-15T22:18:38.928Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=07dc8a1d] Exit vim.cns.VolumeManager.create (4 ms)
2021-09-15T22:18:38.933Z info vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task started
2021-09-15T22:18:39.029Z error vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: backingDiskId not found: eb05ab8b-dcd0-4217-9c34-3ec8bda666a9, N3Vim5Fault8NotFound9ExceptionE(Fault cause: vim.fault.NotFound
--> )
--> [context]zKq7AVECAAAAAFnREgEVdnNhbnZjbWdtdGQAAFy7KmxpYnZtYWNvcmUuc28AAAw5GwB+uxgBNG38bGlidmltLXR5cGVzLnNvAIG7mg8BgVrlDwECtWEObGlidm1vbWkuc28AAspdEQJXXxEDHuoCbGliUHlDcHBWbW9taS5zbwACHJISAvGKEgTozgNsaWJ2c2xtLXR5cGVzLnNvAAXF3AdfY25zLnNvAAVN1gUFixQGABYkIwCSJiMA5RIrBtRzAGxpYnB0aHJlYWQuc28uMAAHzY4ObGliYy5zby42AA==[/context]
2021-09-15T22:18:39.031Z info vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] Create volume completed: task-314869
2021-09-15T22:18:39.054Z verbose vsanvcmgmtd[12908] [[email protected] sub=PyBackedMO opId=07dc8a1e] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: create volume task created: task-314870
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task scheduled.
2021-09-15T22:18:39.057Z verbose vsanvcmgmtd[12908] [[email protected] sub=PyBackedMO opId=07dc8a1e] Exit vim.cns.VolumeManager.create (3 ms)
2021-09-15T22:18:39.062Z info vsanvcmgmtd[43408] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task started
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1f] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1f]
What you expected to happen:
CNS volume objects should not just disappear?
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- csi-vsphere version:
- vsphere-cloud-controller-manager version: v1.20.0
- Kubernetes version: 1.20.10
- vSphere version: 6.7u3 (6.7.0.48000)
- OS (e.g. from /etc/os-release): ubuntu 20.04
- Kernel (e.g.
uname -a
):5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Others:
vsphere-csi-controller log: https://gist.github.com/jingxu97/842b05c0452d8696d98a96e56793a048 csi-attacher log: https://gist.github.com/jingxu97/0289b2277251b31a01f75b057fd47615
jingxu97 issue comment kubernetes-sigs/vsphere-csi-driver
CNS volumes disappear and all in cluster operations fail
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened: In 2 separate kubernetes clusters we have observed failures when pods get moved and as a result the PVs have to be attached to the new node. The CNS volume object seems to just disappear, the FCD is still intact on the vsan. Nothing has triggered a deletion as far as I can see...
Events from the kubernetes side:
kubectl get event -n workload
LAST SEEN TYPE REASON OBJECT MESSAGE
26s Normal Scheduled pod/redis-754fbf4bd-26lbf Successfully assigned workload/redis-754fbf4bd-26lbf to worker3
11s Warning FailedAttachVolume pod/redis-754fbf4bd-26lbf AttachVolume.Attach failed for volume "pvc-202662bf-3ce7-40a0-96af-d22d58198dce" : rpc error: code = Internal desc = failed to attach disk: "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" with node: "worker3" err ServerFaultCode: Received SOAP response fault from [<cs p:00007fa4dc0a6290, TCP:localhost:443>]: retrieveVStorageObject
Logs from csi-controller:
2021-09-17 12:27:36
I0917 09:27:36.946521 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:36
I0917 09:27:36.340379 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:35
I0917 09:27:35.969819 1 csi_handler.go:226] Error processing "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba": failed to detach: rpc error: code = Internal desc = volumeID "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" not found in QueryVolume
2021-09-17 12:27:35
I0917 09:27:35.969770 1 csi_handler.go:612] Saved detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
I0917 09:27:35.967871 1 controller.go:158] Ignoring VolumeAttachment "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba" change
2021-09-17 12:27:35
I0917 09:27:35.952349 1 csi_handler.go:601] Saving detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
{"level":"error","time":"2021-09-17T09:27:35.951799411Z","caller":"vanilla/controller.go:883","msg":"volumeID \"eb05ab8b-dcd0-4217-9c34-3ec8bda666a9\" not found in QueryVolume","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:883\nsigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume\n\t/build/pkg/csi/service/vanilla/controller.go:937\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler.func1\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5200\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).controllerUnpublishVolume\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:141\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:88\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:178\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:218\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:177\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi.(*StoragePlugin).injectContext\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware.go:231\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:106\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5202\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1024\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1313\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:722"}
2021-09-17 12:27:35
{"level":"info","time":"2021-09-17T09:27:35.904885335Z","caller":"vanilla/controller.go:857","msg":"ControllerUnpublishVolume: called with args {VolumeId:eb05ab8b-dcd0-4217-9c34-3ec8bda666a9 NodeId:worker3 Secrets:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3"}
2021-09-17 12:27:35
I0917 09:27:35.903737 1 csi_handler.go:715] Found NodeID worker3 in CSINode worker3
And some entries from vsanvcmgmtd:
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Exit vasa.NotificationManager.getAlarm (0 ms)
2021-09-15T22:18:38.683Z info vsanvcmgmtd[07066] [[email protected] sub=AccessChecker] User CLISTER.LOCAL\[email protected] was authenticated with soap session id. 52db629f-f84a-15b2-6401-b41b25af9ec7 (52a55bd9-9ac9-d4bd-d9f0-d6145bb4f7a5)
2021-09-15T22:18:38.704Z verbose vsanvcmgmtd[07065] [[email protected] sub=PyBackedMO opId=07dc8a1b] Enter vim.cns.VolumeManager.queryAll, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.706Z verbose vsanvcmgmtd[07065] [[email protected] sub=PyBackedMO opId=07dc8a1b] Exit vim.cns.VolumeManager.queryAll (1 ms)
2021-09-15T22:18:38.731Z verbose vsanvcmgmtd[07104] [[email protected] sub=PyBackedMO opId=07dc8a1c] Enter vim.cns.VolumeManager.query, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.885Z verbose vsanvcmgmtd[07104] [[email protected] sub=PyBackedMO opId=07dc8a1c] Exit vim.cns.VolumeManager.query (154 ms)
2021-09-15T22:18:38.923Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=07dc8a1d] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: create volume task created: task-314869
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task scheduled.
2021-09-15T22:18:38.928Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=07dc8a1d] Exit vim.cns.VolumeManager.create (4 ms)
2021-09-15T22:18:38.933Z info vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task started
2021-09-15T22:18:39.029Z error vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: backingDiskId not found: eb05ab8b-dcd0-4217-9c34-3ec8bda666a9, N3Vim5Fault8NotFound9ExceptionE(Fault cause: vim.fault.NotFound
--> )
--> [context]zKq7AVECAAAAAFnREgEVdnNhbnZjbWdtdGQAAFy7KmxpYnZtYWNvcmUuc28AAAw5GwB+uxgBNG38bGlidmltLXR5cGVzLnNvAIG7mg8BgVrlDwECtWEObGlidm1vbWkuc28AAspdEQJXXxEDHuoCbGliUHlDcHBWbW9taS5zbwACHJISAvGKEgTozgNsaWJ2c2xtLXR5cGVzLnNvAAXF3AdfY25zLnNvAAVN1gUFixQGABYkIwCSJiMA5RIrBtRzAGxpYnB0aHJlYWQuc28uMAAHzY4ObGliYy5zby42AA==[/context]
2021-09-15T22:18:39.031Z info vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] Create volume completed: task-314869
2021-09-15T22:18:39.054Z verbose vsanvcmgmtd[12908] [[email protected] sub=PyBackedMO opId=07dc8a1e] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: create volume task created: task-314870
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task scheduled.
2021-09-15T22:18:39.057Z verbose vsanvcmgmtd[12908] [[email protected] sub=PyBackedMO opId=07dc8a1e] Exit vim.cns.VolumeManager.create (3 ms)
2021-09-15T22:18:39.062Z info vsanvcmgmtd[43408] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task started
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1f] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1f]
What you expected to happen:
CNS volume objects should not just disappear?
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- csi-vsphere version:
- vsphere-cloud-controller-manager version: v1.20.0
- Kubernetes version: 1.20.10
- vSphere version: 6.7u3 (6.7.0.48000)
- OS (e.g. from /etc/os-release): ubuntu 20.04
- Kernel (e.g.
uname -a
):5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Others:
I don't think there is datastore evacuation happening.
This specific test flaky is "an existing volume should be accessible on a new node after cluster scale up" from Anthos qualification test. I think it only happened in the situation
- pod with a volume created
- delete pod
- create a new node to join the cluster
- create the pod the the new node
There are other tests that will make volume detached and attached to a different node, but only this test fail. The only difference I can see is it is trying to attach to a new node. In this case, attach failed with "Failed to retrieve datastore for vol". After timeout, it tries to detach and also failed with the same error.
jingxu97 issue comment kubernetes/website
Update docs to mark in-tree GCP PD plugin as deprecated
This is a Bug Report
Problem: CSI Migration for gcepersistentdisk moved to Beta in 1.17 release so the in-tree plugin was already deprecated at that time. The in-tree plugin should be marked as deprecated in Kubernetes docs.
https://kubernetes.io/docs/concepts/storage/volumes/#gcepersistentdisk
Proposed Solution:
Page to Update: https://kubernetes.io/...
jingxu97 issue comment kubernetes-sigs/vsphere-csi-driver
CNS volumes disappear and all in cluster operations fail
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened: In 2 separate kubernetes clusters we have observed failures when pods get moved and as a result the PVs have to be attached to the new node. The CNS volume object seems to just disappear, the FCD is still intact on the vsan. Nothing has triggered a deletion as far as I can see...
Events from the kubernetes side:
kubectl get event -n workload
LAST SEEN TYPE REASON OBJECT MESSAGE
26s Normal Scheduled pod/redis-754fbf4bd-26lbf Successfully assigned workload/redis-754fbf4bd-26lbf to worker3
11s Warning FailedAttachVolume pod/redis-754fbf4bd-26lbf AttachVolume.Attach failed for volume "pvc-202662bf-3ce7-40a0-96af-d22d58198dce" : rpc error: code = Internal desc = failed to attach disk: "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" with node: "worker3" err ServerFaultCode: Received SOAP response fault from [<cs p:00007fa4dc0a6290, TCP:localhost:443>]: retrieveVStorageObject
Logs from csi-controller:
2021-09-17 12:27:36
I0917 09:27:36.946521 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:36
I0917 09:27:36.340379 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:35
I0917 09:27:35.969819 1 csi_handler.go:226] Error processing "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba": failed to detach: rpc error: code = Internal desc = volumeID "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" not found in QueryVolume
2021-09-17 12:27:35
I0917 09:27:35.969770 1 csi_handler.go:612] Saved detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
I0917 09:27:35.967871 1 controller.go:158] Ignoring VolumeAttachment "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba" change
2021-09-17 12:27:35
I0917 09:27:35.952349 1 csi_handler.go:601] Saving detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
{"level":"error","time":"2021-09-17T09:27:35.951799411Z","caller":"vanilla/controller.go:883","msg":"volumeID \"eb05ab8b-dcd0-4217-9c34-3ec8bda666a9\" not found in QueryVolume","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:883\nsigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume\n\t/build/pkg/csi/service/vanilla/controller.go:937\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler.func1\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5200\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).controllerUnpublishVolume\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:141\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:88\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:178\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:218\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:177\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi.(*StoragePlugin).injectContext\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware.go:231\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:106\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5202\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1024\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1313\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:722"}
2021-09-17 12:27:35
{"level":"info","time":"2021-09-17T09:27:35.904885335Z","caller":"vanilla/controller.go:857","msg":"ControllerUnpublishVolume: called with args {VolumeId:eb05ab8b-dcd0-4217-9c34-3ec8bda666a9 NodeId:worker3 Secrets:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3"}
2021-09-17 12:27:35
I0917 09:27:35.903737 1 csi_handler.go:715] Found NodeID worker3 in CSINode worker3
And some entries from vsanvcmgmtd:
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Exit vasa.NotificationManager.getAlarm (0 ms)
2021-09-15T22:18:38.683Z info vsanvcmgmtd[07066] [[email protected] sub=AccessChecker] User CLISTER.LOCAL\[email protected] was authenticated with soap session id. 52db629f-f84a-15b2-6401-b41b25af9ec7 (52a55bd9-9ac9-d4bd-d9f0-d6145bb4f7a5)
2021-09-15T22:18:38.704Z verbose vsanvcmgmtd[07065] [[email protected] sub=PyBackedMO opId=07dc8a1b] Enter vim.cns.VolumeManager.queryAll, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.706Z verbose vsanvcmgmtd[07065] [[email protected] sub=PyBackedMO opId=07dc8a1b] Exit vim.cns.VolumeManager.queryAll (1 ms)
2021-09-15T22:18:38.731Z verbose vsanvcmgmtd[07104] [[email protected] sub=PyBackedMO opId=07dc8a1c] Enter vim.cns.VolumeManager.query, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.885Z verbose vsanvcmgmtd[07104] [[email protected] sub=PyBackedMO opId=07dc8a1c] Exit vim.cns.VolumeManager.query (154 ms)
2021-09-15T22:18:38.923Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=07dc8a1d] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: create volume task created: task-314869
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task scheduled.
2021-09-15T22:18:38.928Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=07dc8a1d] Exit vim.cns.VolumeManager.create (4 ms)
2021-09-15T22:18:38.933Z info vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task started
2021-09-15T22:18:39.029Z error vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: backingDiskId not found: eb05ab8b-dcd0-4217-9c34-3ec8bda666a9, N3Vim5Fault8NotFound9ExceptionE(Fault cause: vim.fault.NotFound
--> )
--> [context]zKq7AVECAAAAAFnREgEVdnNhbnZjbWdtdGQAAFy7KmxpYnZtYWNvcmUuc28AAAw5GwB+uxgBNG38bGlidmltLXR5cGVzLnNvAIG7mg8BgVrlDwECtWEObGlidm1vbWkuc28AAspdEQJXXxEDHuoCbGliUHlDcHBWbW9taS5zbwACHJISAvGKEgTozgNsaWJ2c2xtLXR5cGVzLnNvAAXF3AdfY25zLnNvAAVN1gUFixQGABYkIwCSJiMA5RIrBtRzAGxpYnB0aHJlYWQuc28uMAAHzY4ObGliYy5zby42AA==[/context]
2021-09-15T22:18:39.031Z info vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] Create volume completed: task-314869
2021-09-15T22:18:39.054Z verbose vsanvcmgmtd[12908] [[email protected] sub=PyBackedMO opId=07dc8a1e] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: create volume task created: task-314870
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task scheduled.
2021-09-15T22:18:39.057Z verbose vsanvcmgmtd[12908] [[email protected] sub=PyBackedMO opId=07dc8a1e] Exit vim.cns.VolumeManager.create (3 ms)
2021-09-15T22:18:39.062Z info vsanvcmgmtd[43408] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task started
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1f] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1f]
What you expected to happen:
CNS volume objects should not just disappear?
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- csi-vsphere version:
- vsphere-cloud-controller-manager version: v1.20.0
- Kubernetes version: 1.20.10
- vSphere version: 6.7u3 (6.7.0.48000)
- OS (e.g. from /etc/os-release): ubuntu 20.04
- Kernel (e.g.
uname -a
):5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Others:
Also another issue related to this test is since attach volume failed with "not found" error, attach_detach_controller will mark volume as uncertain, and try to detach volume when pod is deleted, however detach also failed with "not found" error
{"level":"info","time":"2022-04-10T17:44:49.039474284Z","caller":"vanilla/controller.go:951","msg":"ControllerPublishVolume: called with args {VolumeId:e5dc0f1b-399b-4839-955e-18a174e4e1ea NodeId:08050d6e39f9-qual-322-0afbac17 VolumeCapability:mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > Readonly:false Secrets:map[] VolumeContext:map[storage.kubernetes.io/csiProvisionerIdentity:1649603959450-8081-csi.vsphere.vmware.com type:vSphere CNS Block Volume] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"6d250629-8008-41b7-baed-2deb2a6baca9"}
{"level":"error","time":"2022-04-10T17:44:49.072198256Z","caller":"volume/manager.go:616","msg":"CNS AttachVolume failed from vCenter \"atl-qual-vc02.anthos\" with err: ServerFaultCode: CNS: Failed to retrieve datastore for vol e5dc0f1b-399b-4839-955e-18a174e4e1ea. (vim.fault.NotFound) {\n faultCause = (vmodl.MethodFault) null, \n faultMessage = <unset>\n msg = \"The vStorageObject (vim.vslm.ID) {\n dynamicType = null,\n dynamicProperty = null,\n id = e5dc0f1b-399b-4839-955e-18a174e4e1ea\n} was not found\"\n}","TraceId":"6d250629-8008-41b7-baed-2deb2a6baca9","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/common/cns-lib/volume.(*defaultManager).AttachVolume.func1\n\t/build/pkg/common/cns-lib/volume/manager.go:616\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/common/cns-lib/volume.(*defaultManager).AttachVolume\n\t/build/pkg/common/cns-lib/volume/manager.go:672\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/common.AttachVolumeUtil\n\t/build/pkg/csi/service/common/vsphereutil.go:548\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).ControllerPublishVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:1037\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).ControllerPublishVolume\n\t/build/pkg/csi/service/vanilla/controller.go:1050\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerPublishVolume_Handler.func1\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5632\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).controllerPublishVolume\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:120\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:86\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:178\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:218\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:177\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi.(*StoragePlugin).injectContext\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware.go:231\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:106\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerPublishVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5634\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1024\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1313\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:722"}
{"level":"info","time":"2022-04-10T17:44:49.07233599Z","caller":"volume/util.go:343","msg":"Extract vimfault type: +types.NotFound. SoapFault Info: +&{{http://schemas.xmlsoap.org/soap/envelope/ Fault} ServerFaultCode CNS: Failed to retrieve datastore for vol e5dc0f1b-399b-4839-955e-18a174e4e1ea. (vim.fault.NotFound) {\n faultCause = (vmodl.MethodFault) null, \n faultMessage = <unset>\n msg = \"The vStorageObject (vim.vslm.ID) {\n dynamicType = null,\n dynamicProperty = null,\n id = e5dc0f1b-399b-4839-955e-18a174e4e1ea\n} was not found\"\n} {{{{<nil> []}}}}} from err +ServerFaultCode: CNS: Failed to retrieve datastore for vol e5dc0f1b-399b-4839-955e-18a174e4e1ea. (vim.fault.NotFound) {\n faultCause = (vmodl.MethodFault) null, \n faultMessage = <unset>\n msg = \"The vStorageObject (vim.vslm.ID) {\n dynamicType = null,\n dynamicProperty = null,\n id = e5dc0f1b-399b-4839-955e-18a174e4e1ea\n} was not found\"\n}","TraceId":"6d250629-8008-41b7-baed-2deb2a6baca9"}
{"level":"error","time":"2022-04-10T17:44:49.072375655Z","caller":"common/vsphereutil.go:550","msg":"failed to attach disk \"e5dc0f1b-399b-4839-955e-18a174e4e1ea\" with VM: \"VirtualMachine:vm-1013755 [VirtualCenterHost: atl-qual-vc02.anthos, UUID: 4204f258-985f-b7f0-0782-e9a78fe37425, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-3, VirtualCenterHost: atl-qual-vc02.anthos]]\". err: ServerFaultCode: CNS: Failed to retrieve datastore for vol e5dc0f1b-399b-4839-955e-18a174e4e1ea. (vim.fault.NotFound) {\n faultCause = (vmodl.MethodFault) null, \n faultMessage = <unset>\n msg = \"The vStorageObject (vim.vslm.ID) {\n dynamicType = null,\n dynamicProperty = null,\n id = e5dc0f1b-399b-4839-955e-18a174e4e1ea\n} was not found\"\n} faultType \"vim.fault.NotFound\"","TraceId":"6d250629-8008-41b7-baed-2deb2a6baca9","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/common.AttachVolumeUtil\n\t/build/pkg/csi/service/common/vsphereutil.go:550\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).ControllerPublishVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:1037\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).ControllerPublishVolume\n\t/build/pkg/csi/service/vanilla/controller.go:1050\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerPublishVolume_Handler.func1\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5632\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).controllerPublishVolume\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:120\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:86\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:178\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:218\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:177\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi.(*StoragePlugin).injectContext\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware.go:231\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:106\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerPublishVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5634\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1024\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1313\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:722"}
{"level":"info","time":"2022-04-10T17:47:20.066064719Z","caller":"vanilla/controller.go:1075","msg":"ControllerUnpublishVolume: called with args {VolumeId:e5dc0f1b-399b-4839-955e-18a174e4e1ea NodeId:08050d6e39f9-qual-322-0afbac17 Secrets:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"604cd65a-62ce-49f2-9859-41f6630b7b85"}
{"level":"error","time":"2022-04-10T17:47:20.084717468Z","caller":"vanilla/controller.go:1108","msg":"volumeID \"e5dc0f1b-399b-4839-955e-18a174e4e1ea\" not found in QueryVolume","TraceId":"604cd65a-62ce-49f2-9859-41f6630b7b85","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:1108\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume\n\t/build/pkg/csi/service/vanilla/controller.go:1162\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler.func1\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5650\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).controllerUnpublishVolume\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:141\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:88\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:178\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:218\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:177\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi.(*StoragePlugin).injectContext\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware.go:231\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:106\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5652\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1024\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1313\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:722"}
jingxu97 issue comment kubernetes-csi/docs
Update security considerations for CSI inline ephemeral volumes
This PR:
- Updates the milestones for Generic Ephemeral Inline Volumes (already went GA in 1.23)
- Updates the documentation for CSI inline volumes per the KEP's Security Considerations and Read-only Volumes sections.
KEP: https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/596-csi-inline-volumes Enhancement: https://github.com/kubernetes/enhancements/issues/596
Update security considerations for CSI inline ephemeral volumes
jingxu97 issue comment kubernetes/kubernetes
When volume is not marked in-use, do not backoff
We unnecessarily trigger exp. backoff when volume is not marked in-use. Instead we can wait for volume to be marked as in-use before triggering operation_executor. This could result in reduced time when mounting attached volumes.
/sig storage /kind bug
Allow attached volumes to be mounted quicker by skipping exp. backoff when checking for reported-in-use volumes
oh, I missed it. Maybe 1.21 can be also useful.
jingxu97 issue comment kubernetes-sigs/vsphere-csi-driver
CNS volumes disappear and all in cluster operations fail
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened: In 2 separate kubernetes clusters we have observed failures when pods get moved and as a result the PVs have to be attached to the new node. The CNS volume object seems to just disappear, the FCD is still intact on the vsan. Nothing has triggered a deletion as far as I can see...
Events from the kubernetes side:
kubectl get event -n workload
LAST SEEN TYPE REASON OBJECT MESSAGE
26s Normal Scheduled pod/redis-754fbf4bd-26lbf Successfully assigned workload/redis-754fbf4bd-26lbf to worker3
11s Warning FailedAttachVolume pod/redis-754fbf4bd-26lbf AttachVolume.Attach failed for volume "pvc-202662bf-3ce7-40a0-96af-d22d58198dce" : rpc error: code = Internal desc = failed to attach disk: "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" with node: "worker3" err ServerFaultCode: Received SOAP response fault from [<cs p:00007fa4dc0a6290, TCP:localhost:443>]: retrieveVStorageObject
Logs from csi-controller:
2021-09-17 12:27:36
I0917 09:27:36.946521 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:36
I0917 09:27:36.340379 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:35
I0917 09:27:35.969819 1 csi_handler.go:226] Error processing "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba": failed to detach: rpc error: code = Internal desc = volumeID "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" not found in QueryVolume
2021-09-17 12:27:35
I0917 09:27:35.969770 1 csi_handler.go:612] Saved detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
I0917 09:27:35.967871 1 controller.go:158] Ignoring VolumeAttachment "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba" change
2021-09-17 12:27:35
I0917 09:27:35.952349 1 csi_handler.go:601] Saving detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
{"level":"error","time":"2021-09-17T09:27:35.951799411Z","caller":"vanilla/controller.go:883","msg":"volumeID \"eb05ab8b-dcd0-4217-9c34-3ec8bda666a9\" not found in QueryVolume","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:883\nsigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume\n\t/build/pkg/csi/service/vanilla/controller.go:937\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler.func1\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5200\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).controllerUnpublishVolume\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:141\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:88\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:178\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:218\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:177\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi.(*StoragePlugin).injectContext\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware.go:231\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:106\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5202\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1024\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1313\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:722"}
2021-09-17 12:27:35
{"level":"info","time":"2021-09-17T09:27:35.904885335Z","caller":"vanilla/controller.go:857","msg":"ControllerUnpublishVolume: called with args {VolumeId:eb05ab8b-dcd0-4217-9c34-3ec8bda666a9 NodeId:worker3 Secrets:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3"}
2021-09-17 12:27:35
I0917 09:27:35.903737 1 csi_handler.go:715] Found NodeID worker3 in CSINode worker3
And some entries from vsanvcmgmtd:
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Exit vasa.NotificationManager.getAlarm (0 ms)
2021-09-15T22:18:38.683Z info vsanvcmgmtd[07066] [[email protected] sub=AccessChecker] User CLISTER.LOCAL\[email protected] was authenticated with soap session id. 52db629f-f84a-15b2-6401-b41b25af9ec7 (52a55bd9-9ac9-d4bd-d9f0-d6145bb4f7a5)
2021-09-15T22:18:38.704Z verbose vsanvcmgmtd[07065] [[email protected] sub=PyBackedMO opId=07dc8a1b] Enter vim.cns.VolumeManager.queryAll, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.706Z verbose vsanvcmgmtd[07065] [[email protected] sub=PyBackedMO opId=07dc8a1b] Exit vim.cns.VolumeManager.queryAll (1 ms)
2021-09-15T22:18:38.731Z verbose vsanvcmgmtd[07104] [[email protected] sub=PyBackedMO opId=07dc8a1c] Enter vim.cns.VolumeManager.query, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.885Z verbose vsanvcmgmtd[07104] [[email protected] sub=PyBackedMO opId=07dc8a1c] Exit vim.cns.VolumeManager.query (154 ms)
2021-09-15T22:18:38.923Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=07dc8a1d] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: create volume task created: task-314869
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task scheduled.
2021-09-15T22:18:38.928Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=07dc8a1d] Exit vim.cns.VolumeManager.create (4 ms)
2021-09-15T22:18:38.933Z info vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task started
2021-09-15T22:18:39.029Z error vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: backingDiskId not found: eb05ab8b-dcd0-4217-9c34-3ec8bda666a9, N3Vim5Fault8NotFound9ExceptionE(Fault cause: vim.fault.NotFound
--> )
--> [context]zKq7AVECAAAAAFnREgEVdnNhbnZjbWdtdGQAAFy7KmxpYnZtYWNvcmUuc28AAAw5GwB+uxgBNG38bGlidmltLXR5cGVzLnNvAIG7mg8BgVrlDwECtWEObGlidm1vbWkuc28AAspdEQJXXxEDHuoCbGliUHlDcHBWbW9taS5zbwACHJISAvGKEgTozgNsaWJ2c2xtLXR5cGVzLnNvAAXF3AdfY25zLnNvAAVN1gUFixQGABYkIwCSJiMA5RIrBtRzAGxpYnB0aHJlYWQuc28uMAAHzY4ObGliYy5zby42AA==[/context]
2021-09-15T22:18:39.031Z info vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] Create volume completed: task-314869
2021-09-15T22:18:39.054Z verbose vsanvcmgmtd[12908] [[email protected] sub=PyBackedMO opId=07dc8a1e] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: create volume task created: task-314870
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task scheduled.
2021-09-15T22:18:39.057Z verbose vsanvcmgmtd[12908] [[email protected] sub=PyBackedMO opId=07dc8a1e] Exit vim.cns.VolumeManager.create (3 ms)
2021-09-15T22:18:39.062Z info vsanvcmgmtd[43408] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task started
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1f] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1f]
What you expected to happen:
CNS volume objects should not just disappear?
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- csi-vsphere version:
- vsphere-cloud-controller-manager version: v1.20.0
- Kubernetes version: 1.20.10
- vSphere version: 6.7u3 (6.7.0.48000)
- OS (e.g. from /etc/os-release): ubuntu 20.04
- Kernel (e.g.
uname -a
):5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Others:
From our testing, looks 6.7u3 is ok, but 7.0 failed very often
jingxu97 issue comment kubernetes-sigs/vsphere-csi-driver
CNS volumes disappear and all in cluster operations fail
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened: In 2 separate kubernetes clusters we have observed failures when pods get moved and as a result the PVs have to be attached to the new node. The CNS volume object seems to just disappear, the FCD is still intact on the vsan. Nothing has triggered a deletion as far as I can see...
Events from the kubernetes side:
kubectl get event -n workload
LAST SEEN TYPE REASON OBJECT MESSAGE
26s Normal Scheduled pod/redis-754fbf4bd-26lbf Successfully assigned workload/redis-754fbf4bd-26lbf to worker3
11s Warning FailedAttachVolume pod/redis-754fbf4bd-26lbf AttachVolume.Attach failed for volume "pvc-202662bf-3ce7-40a0-96af-d22d58198dce" : rpc error: code = Internal desc = failed to attach disk: "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" with node: "worker3" err ServerFaultCode: Received SOAP response fault from [<cs p:00007fa4dc0a6290, TCP:localhost:443>]: retrieveVStorageObject
Logs from csi-controller:
2021-09-17 12:27:36
I0917 09:27:36.946521 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:36
I0917 09:27:36.340379 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:35
I0917 09:27:35.969819 1 csi_handler.go:226] Error processing "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba": failed to detach: rpc error: code = Internal desc = volumeID "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" not found in QueryVolume
2021-09-17 12:27:35
I0917 09:27:35.969770 1 csi_handler.go:612] Saved detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
I0917 09:27:35.967871 1 controller.go:158] Ignoring VolumeAttachment "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba" change
2021-09-17 12:27:35
I0917 09:27:35.952349 1 csi_handler.go:601] Saving detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
{"level":"error","time":"2021-09-17T09:27:35.951799411Z","caller":"vanilla/controller.go:883","msg":"volumeID \"eb05ab8b-dcd0-4217-9c34-3ec8bda666a9\" not found in QueryVolume","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:883\nsigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume\n\t/build/pkg/csi/service/vanilla/controller.go:937\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler.func1\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5200\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).controllerUnpublishVolume\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:141\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:88\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:178\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:218\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:177\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi.(*StoragePlugin).injectContext\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware.go:231\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:106\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5202\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1024\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1313\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:722"}
2021-09-17 12:27:35
{"level":"info","time":"2021-09-17T09:27:35.904885335Z","caller":"vanilla/controller.go:857","msg":"ControllerUnpublishVolume: called with args {VolumeId:eb05ab8b-dcd0-4217-9c34-3ec8bda666a9 NodeId:worker3 Secrets:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3"}
2021-09-17 12:27:35
I0917 09:27:35.903737 1 csi_handler.go:715] Found NodeID worker3 in CSINode worker3
And some entries from vsanvcmgmtd:
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Exit vasa.NotificationManager.getAlarm (0 ms)
2021-09-15T22:18:38.683Z info vsanvcmgmtd[07066] [[email protected] sub=AccessChecker] User CLISTER.LOCAL\[email protected] was authenticated with soap session id. 52db629f-f84a-15b2-6401-b41b25af9ec7 (52a55bd9-9ac9-d4bd-d9f0-d6145bb4f7a5)
2021-09-15T22:18:38.704Z verbose vsanvcmgmtd[07065] [[email protected] sub=PyBackedMO opId=07dc8a1b] Enter vim.cns.VolumeManager.queryAll, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.706Z verbose vsanvcmgmtd[07065] [[email protected] sub=PyBackedMO opId=07dc8a1b] Exit vim.cns.VolumeManager.queryAll (1 ms)
2021-09-15T22:18:38.731Z verbose vsanvcmgmtd[07104] [[email protected] sub=PyBackedMO opId=07dc8a1c] Enter vim.cns.VolumeManager.query, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.885Z verbose vsanvcmgmtd[07104] [[email protected] sub=PyBackedMO opId=07dc8a1c] Exit vim.cns.VolumeManager.query (154 ms)
2021-09-15T22:18:38.923Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=07dc8a1d] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: create volume task created: task-314869
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task scheduled.
2021-09-15T22:18:38.928Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=07dc8a1d] Exit vim.cns.VolumeManager.create (4 ms)
2021-09-15T22:18:38.933Z info vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task started
2021-09-15T22:18:39.029Z error vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: backingDiskId not found: eb05ab8b-dcd0-4217-9c34-3ec8bda666a9, N3Vim5Fault8NotFound9ExceptionE(Fault cause: vim.fault.NotFound
--> )
--> [context]zKq7AVECAAAAAFnREgEVdnNhbnZjbWdtdGQAAFy7KmxpYnZtYWNvcmUuc28AAAw5GwB+uxgBNG38bGlidmltLXR5cGVzLnNvAIG7mg8BgVrlDwECtWEObGlidm1vbWkuc28AAspdEQJXXxEDHuoCbGliUHlDcHBWbW9taS5zbwACHJISAvGKEgTozgNsaWJ2c2xtLXR5cGVzLnNvAAXF3AdfY25zLnNvAAVN1gUFixQGABYkIwCSJiMA5RIrBtRzAGxpYnB0aHJlYWQuc28uMAAHzY4ObGliYy5zby42AA==[/context]
2021-09-15T22:18:39.031Z info vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] Create volume completed: task-314869
2021-09-15T22:18:39.054Z verbose vsanvcmgmtd[12908] [[email protected] sub=PyBackedMO opId=07dc8a1e] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: create volume task created: task-314870
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task scheduled.
2021-09-15T22:18:39.057Z verbose vsanvcmgmtd[12908] [[email protected] sub=PyBackedMO opId=07dc8a1e] Exit vim.cns.VolumeManager.create (3 ms)
2021-09-15T22:18:39.062Z info vsanvcmgmtd[43408] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task started
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1f] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1f]
What you expected to happen:
CNS volume objects should not just disappear?
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- csi-vsphere version:
- vsphere-cloud-controller-manager version: v1.20.0
- Kubernetes version: 1.20.10
- vSphere version: 6.7u3 (6.7.0.48000)
- OS (e.g. from /etc/os-release): ubuntu 20.04
- Kernel (e.g.
uname -a
):5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Others:
vSphere version 7.0u3 CSI driver: 2.4.0
The test (anthos storage qualification test) is designed to move the pod to a different node and then access the same volume.
jingxu97 issue comment kubernetes-sigs/vsphere-csi-driver
CNS volumes disappear and all in cluster operations fail
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened: In 2 separate kubernetes clusters we have observed failures when pods get moved and as a result the PVs have to be attached to the new node. The CNS volume object seems to just disappear, the FCD is still intact on the vsan. Nothing has triggered a deletion as far as I can see...
Events from the kubernetes side:
kubectl get event -n workload
LAST SEEN TYPE REASON OBJECT MESSAGE
26s Normal Scheduled pod/redis-754fbf4bd-26lbf Successfully assigned workload/redis-754fbf4bd-26lbf to worker3
11s Warning FailedAttachVolume pod/redis-754fbf4bd-26lbf AttachVolume.Attach failed for volume "pvc-202662bf-3ce7-40a0-96af-d22d58198dce" : rpc error: code = Internal desc = failed to attach disk: "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" with node: "worker3" err ServerFaultCode: Received SOAP response fault from [<cs p:00007fa4dc0a6290, TCP:localhost:443>]: retrieveVStorageObject
Logs from csi-controller:
2021-09-17 12:27:36
I0917 09:27:36.946521 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:36
I0917 09:27:36.340379 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
2021-09-17 12:27:35
I0917 09:27:35.969819 1 csi_handler.go:226] Error processing "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba": failed to detach: rpc error: code = Internal desc = volumeID "eb05ab8b-dcd0-4217-9c34-3ec8bda666a9" not found in QueryVolume
2021-09-17 12:27:35
I0917 09:27:35.969770 1 csi_handler.go:612] Saved detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
I0917 09:27:35.967871 1 controller.go:158] Ignoring VolumeAttachment "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba" change
2021-09-17 12:27:35
I0917 09:27:35.952349 1 csi_handler.go:601] Saving detach error to "csi-697720ade9eeaa3b9851f3276fb8a3270cda2ff287e7a44690e09c7a9b3bcfba"
2021-09-17 12:27:35
{"level":"error","time":"2021-09-17T09:27:35.951799411Z","caller":"vanilla/controller.go:883","msg":"volumeID \"eb05ab8b-dcd0-4217-9c34-3ec8bda666a9\" not found in QueryVolume","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3","stacktrace":"sigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:883\nsigs.k8s.io/vsphere-csi-driver/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume\n\t/build/pkg/csi/service/vanilla/controller.go:937\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler.func1\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5200\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).controllerUnpublishVolume\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:141\ngithub.com/rexray/gocsi/middleware/serialvolume.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/serialvolume/serial_volume_locker.go:88\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer.func1\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:178\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handle\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:218\ngithub.com/rexray/gocsi/middleware/specvalidator.(*interceptor).handleServer\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware/specvalidator/spec_validator.go:177\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi.(*StoragePlugin).injectContext\n\t/go/pkg/mod/github.com/rexray/[email protected]/middleware.go:231\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2.1.1\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:99\ngithub.com/rexray/gocsi/utils.ChainUnaryServer.func2\n\t/go/pkg/mod/github.com/rexray/[email protected]/utils/utils_middleware.go:106\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:5202\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1024\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1313\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:722"}
2021-09-17 12:27:35
{"level":"info","time":"2021-09-17T09:27:35.904885335Z","caller":"vanilla/controller.go:857","msg":"ControllerUnpublishVolume: called with args {VolumeId:eb05ab8b-dcd0-4217-9c34-3ec8bda666a9 NodeId:worker3 Secrets:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"e3830e8e-5954-433b-a777-e5623668c7b3"}
2021-09-17 12:27:35
I0917 09:27:35.903737 1 csi_handler.go:715] Found NodeID worker3 in CSINode worker3
And some entries from vsanvcmgmtd:
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:29.157Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1a] Exit vasa.NotificationManager.getAlarm (0 ms)
2021-09-15T22:18:38.683Z info vsanvcmgmtd[07066] [[email protected] sub=AccessChecker] User CLISTER.LOCAL\[email protected] was authenticated with soap session id. 52db629f-f84a-15b2-6401-b41b25af9ec7 (52a55bd9-9ac9-d4bd-d9f0-d6145bb4f7a5)
2021-09-15T22:18:38.704Z verbose vsanvcmgmtd[07065] [[email protected] sub=PyBackedMO opId=07dc8a1b] Enter vim.cns.VolumeManager.queryAll, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.706Z verbose vsanvcmgmtd[07065] [[email protected] sub=PyBackedMO opId=07dc8a1b] Exit vim.cns.VolumeManager.queryAll (1 ms)
2021-09-15T22:18:38.731Z verbose vsanvcmgmtd[07104] [[email protected] sub=PyBackedMO opId=07dc8a1c] Enter vim.cns.VolumeManager.query, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.885Z verbose vsanvcmgmtd[07104] [[email protected] sub=PyBackedMO opId=07dc8a1c] Exit vim.cns.VolumeManager.query (154 ms)
2021-09-15T22:18:38.923Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=07dc8a1d] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: create volume task created: task-314869
2021-09-15T22:18:38.928Z info vsanvcmgmtd[10186] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task scheduled.
2021-09-15T22:18:38.928Z verbose vsanvcmgmtd[10186] [[email protected] sub=PyBackedMO opId=07dc8a1d] Exit vim.cns.VolumeManager.create (4 ms)
2021-09-15T22:18:38.933Z info vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: Creating volume task started
2021-09-15T22:18:39.029Z error vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] CNS: backingDiskId not found: eb05ab8b-dcd0-4217-9c34-3ec8bda666a9, N3Vim5Fault8NotFound9ExceptionE(Fault cause: vim.fault.NotFound
--> )
--> [context]zKq7AVECAAAAAFnREgEVdnNhbnZjbWdtdGQAAFy7KmxpYnZtYWNvcmUuc28AAAw5GwB+uxgBNG38bGlidmltLXR5cGVzLnNvAIG7mg8BgVrlDwECtWEObGlidm1vbWkuc28AAspdEQJXXxEDHuoCbGliUHlDcHBWbW9taS5zbwACHJISAvGKEgTozgNsaWJ2c2xtLXR5cGVzLnNvAAXF3AdfY25zLnNvAAVN1gUFixQGABYkIwCSJiMA5RIrBtRzAGxpYnB0aHJlYWQuc28uMAAHzY4ObGliYy5zby42AA==[/context]
2021-09-15T22:18:39.031Z info vsanvcmgmtd[25351] [[email protected] sub=VolumeManager opId=07dc8a1d] Create volume completed: task-314869
2021-09-15T22:18:39.054Z verbose vsanvcmgmtd[12908] [[email protected] sub=PyBackedMO opId=07dc8a1e] Enter vim.cns.VolumeManager.create, Pending: 1 (52db629f-f84a-15b2-6401-b41b25af9ec7)
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: create volume task created: task-314870
2021-09-15T22:18:39.057Z info vsanvcmgmtd[12908] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task scheduled.
2021-09-15T22:18:39.057Z verbose vsanvcmgmtd[12908] [[email protected] sub=PyBackedMO opId=07dc8a1e] Exit vim.cns.VolumeManager.create (3 ms)
2021-09-15T22:18:39.062Z info vsanvcmgmtd[43408] [[email protected] sub=VolumeManager opId=07dc8a1e] CNS: Creating volume task started
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1f] Enter vasa.NotificationManager.getAlarm, Pending: 1 (5269718e-b4b7-0511-b82b-b8f707de46d5)
2021-09-15T22:18:39.158Z verbose vsanvcmgmtd[07049] [[email protected] sub=PyBackedMO opId=sps-Main-13723-334-8a1f]
What you expected to happen:
CNS volume objects should not just disappear?
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- csi-vsphere version:
- vsphere-cloud-controller-manager version: v1.20.0
- Kubernetes version: 1.20.10
- vSphere version: 6.7u3 (6.7.0.48000)
- OS (e.g. from /etc/os-release): ubuntu 20.04
- Kernel (e.g.
uname -a
):5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
- Others:
Merge pull request #2571 from xiaoxubeii/kep-memory-qos
KEP-2570: Support memory qos with cgroups v2