lavalamp

lavalamp

Kubernetes since pre-OSS | SIG API Machinery co-TL | Sr. Staff SWE @ Google | perpetually too busy, ping me if I seem to not have seen your thing

Member Since 11 years ago

Google, NV

Experience Points
513
follower
Lessons Completed
0
follow
Lessons Completed
9
stars
Best Reply Awards
23
repos

129 contributions in the last year

Pinned
⚡ Production-Grade Container Scheduling and Management
⚡ Go client for Kubernetes.
⚡ Fuzz testing for go.
⚡ gengo library for code generation.
⚡ Library for writing a Kubernetes-style API server.
Activity
May
19
4 days ago
Activity icon
issue

lavalamp issue comment kubernetes-sigs/structured-merge-diff

lavalamp
lavalamp

Fix potential conversion failure in addBackOwnedItems

Currently addBackOwnedItems adds back managed fields by API version in a random order (for range a map), consider the following scenario:

We have an object with two API versions: v1 and v2, and use a conversion webhook to convert between the two versions. When converting v2 to v1, the conversion webhook rely on a required field foo of v2.

Now there're several users use server-side apply to manage this object in different versions, when some user initiate a server-side apply to the object in v2, addBackOwnedItems would have to convert and add back fields to the pruned object in both v1 and v2. At this time, the order matters:

If addBackOwnedItems converts and adds back fields to v2 first then v1, everything will go smoothly, however, if the order is reversed, a failed to convert pruned object at version v1 error will arise and the server-side apply will fail. This is because when addBackOwnedItems convert the pruned v1 object, the required field foo of v2 hasn't been added back and will cause the conversion from v2 to v1 to fail.

This PR indroduce a retry loop to addBackOwnedItems to ensure that when it convert the pruned object to a specific version, the object already has all the required fields back.

lavalamp
lavalamp

Many thanks for noticing this and for the test, that's super helpful.

I don't feel like this is quite the root cause though yet? My leading candidates are:

  • we forgot to tell conversion webhook authors that they need to default first (we could actually have the extension-apiserver test this about them and put on a status condition if they don't handle converting a blank object)
  • we forgot to implement defaulting based on the CRD schema
  • Could be user error if the CRD schema lacks defaulting that it should have

I know conversion functions for built-in objects have similar requirements sometimes and we got around that with some clever ordering IIRC.

open pull request

lavalamp wants to merge kubernetes-sigs/structured-merge-diff

lavalamp
lavalamp

Fix potential conversion failure in addBackOwnedItems

Currently addBackOwnedItems adds back managed fields by API version in a random order (for range a map), consider the following scenario:

We have an object with two API versions: v1 and v2, and use a conversion webhook to convert between the two versions. When converting v2 to v1, the conversion webhook rely on a required field foo of v2.

Now there're several users use server-side apply to manage this object in different versions, when some user initiate a server-side apply to the object in v2, addBackOwnedItems would have to convert and add back fields to the pruned object in both v1 and v2. At this time, the order matters:

If addBackOwnedItems converts and adds back fields to v2 first then v1, everything will go smoothly, however, if the order is reversed, a failed to convert pruned object at version v1 error will arise and the server-side apply will fail. This is because when addBackOwnedItems convert the pruned v1 object, the required field foo of v2 hasn't been added back and will cause the conversion from v2 to v1 to fail.

This PR indroduce a retry loop to addBackOwnedItems to ensure that when it convert the pruned object to a specific version, the object already has all the required fields back.

lavalamp
lavalamp

In the very worst case, where there's exactly one conversion sequence that works, this retry loop's runtime complexity appears to be N! for N versions -- I think we can't afford that even though the average case is not that bad.

I do think we can afford to sort by version name so that it's at least deterministic.

I guess we don't hit this problem on built in types because we know how to default them? This is the part I'm still a bit confused about.

If determinism + asking conversion webhooks to first default isn't enough, then we need to cap the number of tries this loop can make, and ideally-- cache the successful order so that we only have to figure it out once. But I hope we don't have to do that.

pull request

lavalamp merge to kubernetes-sigs/structured-merge-diff

lavalamp
lavalamp

Fix potential conversion failure in addBackOwnedItems

Currently addBackOwnedItems adds back managed fields by API version in a random order (for range a map), consider the following scenario:

We have an object with two API versions: v1 and v2, and use a conversion webhook to convert between the two versions. When converting v2 to v1, the conversion webhook rely on a required field foo of v2.

Now there're several users use server-side apply to manage this object in different versions, when some user initiate a server-side apply to the object in v2, addBackOwnedItems would have to convert and add back fields to the pruned object in both v1 and v2. At this time, the order matters:

If addBackOwnedItems converts and adds back fields to v2 first then v1, everything will go smoothly, however, if the order is reversed, a failed to convert pruned object at version v1 error will arise and the server-side apply will fail. This is because when addBackOwnedItems convert the pruned v1 object, the required field foo of v2 hasn't been added back and will cause the conversion from v2 to v1 to fail.

This PR indroduce a retry loop to addBackOwnedItems to ensure that when it convert the pruned object to a specific version, the object already has all the required fields back.

pull request

lavalamp merge to kubernetes-sigs/structured-merge-diff

lavalamp
lavalamp

Fix potential conversion failure in addBackOwnedItems

Currently addBackOwnedItems adds back managed fields by API version in a random order (for range a map), consider the following scenario:

We have an object with two API versions: v1 and v2, and use a conversion webhook to convert between the two versions. When converting v2 to v1, the conversion webhook rely on a required field foo of v2.

Now there're several users use server-side apply to manage this object in different versions, when some user initiate a server-side apply to the object in v2, addBackOwnedItems would have to convert and add back fields to the pruned object in both v1 and v2. At this time, the order matters:

If addBackOwnedItems converts and adds back fields to v2 first then v1, everything will go smoothly, however, if the order is reversed, a failed to convert pruned object at version v1 error will arise and the server-side apply will fail. This is because when addBackOwnedItems convert the pruned v1 object, the required field foo of v2 hasn't been added back and will cause the conversion from v2 to v1 to fail.

This PR indroduce a retry loop to addBackOwnedItems to ensure that when it convert the pruned object to a specific version, the object already has all the required fields back.

open pull request

lavalamp wants to merge kubernetes-sigs/structured-merge-diff

lavalamp
lavalamp

Fix potential conversion failure in addBackOwnedItems

Currently addBackOwnedItems adds back managed fields by API version in a random order (for range a map), consider the following scenario:

We have an object with two API versions: v1 and v2, and use a conversion webhook to convert between the two versions. When converting v2 to v1, the conversion webhook rely on a required field foo of v2.

Now there're several users use server-side apply to manage this object in different versions, when some user initiate a server-side apply to the object in v2, addBackOwnedItems would have to convert and add back fields to the pruned object in both v1 and v2. At this time, the order matters:

If addBackOwnedItems converts and adds back fields to v2 first then v1, everything will go smoothly, however, if the order is reversed, a failed to convert pruned object at version v1 error will arise and the server-side apply will fail. This is because when addBackOwnedItems convert the pruned v1 object, the required field foo of v2 hasn't been added back and will cause the conversion from v2 to v1 to fail.

This PR indroduce a retry loop to addBackOwnedItems to ensure that when it convert the pruned object to a specific version, the object already has all the required fields back.

lavalamp
lavalamp
May
18
5 days ago
open pull request

lavalamp wants to merge kubernetes/kubernetes

lavalamp
lavalamp

clarify a comment on annotation key validation

What type of PR is this?

/kind cleanup /sig api-machinery

What this PR does / why we need it:

Clarify a comment on annotation key validation. Since uppercase is valid for annotation key but not QualifiedName, it's better to clarify it for avoid confusion and code readability.

Which issue(s) this PR fixes:

Fixes #109459

Special notes for your reviewer:

None

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE
lavalamp
lavalamp
		// The rule is QualifiedName except that case doesn't matter, so convert to lowercase before checking.
pull request

lavalamp merge to kubernetes/kubernetes

lavalamp
lavalamp

clarify a comment on annotation key validation

What type of PR is this?

/kind cleanup /sig api-machinery

What this PR does / why we need it:

Clarify a comment on annotation key validation. Since uppercase is valid for annotation key but not QualifiedName, it's better to clarify it for avoid confusion and code readability.

Which issue(s) this PR fixes:

Fixes #109459

Special notes for your reviewer:

None

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE
May
17
6 days ago
Activity icon
issue

lavalamp issue comment kubernetes/kubernetes

lavalamp
lavalamp

apimachinery/clock: Delete the apimachinery/clock package

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Delete the clock package after having a soak period of 1 release cycle (1.24). The package that should be used now is the k8s.io/utils/clock package.

Which issue(s) this PR fixes:

Fixes #94738

Special notes for your reviewer:

It was suggested that we wait for a release or two to make it easier for folks to migrate their code: https://github.com/kubernetes/kubernetes/issues/94738#issuecomment-931386501

Types were deprecated in 1.23.

Does this PR introduce a user-facing change?

apimachinery/clock: This deletes the apimachinery/clock package. Please use k8s.io/utils/clock instead.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/assign @MikeSpreitzer @liggitt

pull request

lavalamp merge to kubernetes-sigs/structured-merge-diff

lavalamp
lavalamp

Fix potential conversion failure in addBackOwnedItems

Currently addBackOwnedItems adds back managed fields by API version in a random order (for range a map), consider the following scenario:

We have an object with two API versions: v1 and v2, and use a conversion webhook to convert between the two versions. When converting v2 to v1, the conversion webhook rely on a required field foo of v2.

Now there're several users use server-side apply to manage this object in different versions, when some user initiate a server-side apply to the object in v2, addBackOwnedItems would have to convert and add back fields to the pruned object in both v1 and v2. At this time, the order matters:

If addBackOwnedItems converts and adds back fields to v2 first then v1, everything will go smoothly, however, if the order is reversed, a failed to convert pruned object at version v1 error will arise and the server-side apply will fail. This is because when addBackOwnedItems convert the pruned v1 object, the required field foo of v2 hasn't been added back and will cause the conversion from v2 to v1 to fail.

This PR indroduce a retry loop to addBackOwnedItems to ensure that when it convert the pruned object to a specific version, the object already has all the required fields back.

open pull request

lavalamp wants to merge kubernetes-sigs/structured-merge-diff

lavalamp
lavalamp

Fix potential conversion failure in addBackOwnedItems

Currently addBackOwnedItems adds back managed fields by API version in a random order (for range a map), consider the following scenario:

We have an object with two API versions: v1 and v2, and use a conversion webhook to convert between the two versions. When converting v2 to v1, the conversion webhook rely on a required field foo of v2.

Now there're several users use server-side apply to manage this object in different versions, when some user initiate a server-side apply to the object in v2, addBackOwnedItems would have to convert and add back fields to the pruned object in both v1 and v2. At this time, the order matters:

If addBackOwnedItems converts and adds back fields to v2 first then v1, everything will go smoothly, however, if the order is reversed, a failed to convert pruned object at version v1 error will arise and the server-side apply will fail. This is because when addBackOwnedItems convert the pruned v1 object, the required field foo of v2 hasn't been added back and will cause the conversion from v2 to v1 to fail.

This PR indroduce a retry loop to addBackOwnedItems to ensure that when it convert the pruned object to a specific version, the object already has all the required fields back.

lavalamp
lavalamp

Can this be more simply fixed by just not modifying merged and starting from the original form of merged each iteration? There should really not be path-dependencies sneaking in from the random iteration order of a loop like this.

Activity icon
issue

lavalamp issue comment kubernetes/kubernetes

lavalamp
lavalamp

remove featuregate in 1.25

Signed-off-by: cyclinder [email protected]

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

remove featuregate in 1.25

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Activity icon
issue

lavalamp issue comment kubernetes/kubernetes

lavalamp
lavalamp

fix: reflector to return wrapped list errors

This fix allows Reflector/Informer callers to detect API errors using the standard Go errors.As unwrapping methods used by the apimachinery helper methods. Combined with a custom WatchErrorHandler, this can be used to stop an informer that encounters specific errors, like resource not found or forbidden.

Without this change, in order to catch errors, the caller needs to do string matching or even regex on the error string, in order to detect certain API errors that should be terminal.

/kind bug

Fix bug that prevented informer/reflector callers from unwrapping and catching specific API errors by type.
May
13
1 week ago
Activity icon
issue

lavalamp issue comment kubernetes/kubernetes

lavalamp
lavalamp

client-side apply does not cause managedFields update

What happened?

I created an object with kubectl apply, and later I updated the same object with another kubectl apply. After that I looked at the object's managedFields, and saw nothing about the update. Here is what I got after the second apply:

apiVersion: v1
data:
  key: the other value
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"key":"the other value"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"test1","namespace":"default"}}
  creationTimestamp: "2022-04-20T20:37:52Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        .: {}
        f:key: {}
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2022-04-20T20:37:52Z"
  name: test1
  namespace: default
  resourceVersion: "19304095"
  uid: 5fb6f9b3-9d9b-475b-89f8-b0d0fe3f2348

What did you expect to happen?

I expected managedFields to get updated by the second kubectl apply.

How can we reproduce it (as minimally and precisely as possible)?

kubectl apply -f - <<EOF
kind: ConfigMap
apiVersion: v1
metadata:
  name: test1
data:
  key: value
EOF

sleep 5

kubectl apply -f - <<EOF
kind: ConfigMap
apiVersion: v1
metadata:
  name: test1
data:
  key: the other value
EOF

kubectl get --show-managed-fields -o yaml cm test1

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:38:05Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:32:02Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

none

OS version

# On Linux:
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

$ uname -a
Linux mjs-ubu3-kube1.sl.cloud9.ibm.com 5.4.0-104-generic #118-Ubuntu SMP Wed Mar 2 19:02:41 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Install tools

kubeadm

Container runtime (CRI) and version (if applicable)

docker

Related plugins (CNI, CSI, ...) and versions (if applicable)

lavalamp
lavalamp

@apelisse, is that documentation correct? Is this what we intended?

May
12
1 week ago
pull request

lavalamp pull request kubernetes/kubernetes

lavalamp
lavalamp

allow case insensitive matching for field selectors

Signed-off-by: Sanskar Jaiswal [email protected]

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR allows for case insensitive matching for field selectors.

Which issue(s) this PR fixes:

Fixes #107285

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Field selectors can be used in a case insensitive manner.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Activity icon
issue

lavalamp issue comment kubernetes/kubernetes

lavalamp
lavalamp

allow case insensitive matching for field selectors

Signed-off-by: Sanskar Jaiswal [email protected]

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR allows for case insensitive matching for field selectors.

Which issue(s) this PR fixes:

Fixes #107285

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Field selectors can be used in a case insensitive manner.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


lavalamp
lavalamp

We definitely can't change this, and it looks like we're missing some test cases ensuring that it stays case sensitive. I would merge a PR adding such tests.

(sorry I didn't see this PR until just now)

May
6
2 weeks ago
Activity icon
issue

lavalamp issue comment kubernetes/kubernetes

lavalamp
lavalamp

Finish clustername removal

This completes the work started in #108717. It should not merge until merges reopen for 1.25.

The compatibility test will fail until I run the command to update it, which I can't do until the tag for 1.24.0 exists.

The `metadata.clusterName` field is completely removed. This should not have any user-visible impact.
lavalamp
lavalamp

/test pull-kubernetes-e2e-gce-ubuntu-containerd

Activity icon
issue

lavalamp issue comment kubernetes/kubernetes

lavalamp
lavalamp

Finish clustername removal

This completes the work started in #108717. It should not merge until merges reopen for 1.25.

The compatibility test will fail until I run the command to update it, which I can't do until the tag for 1.24.0 exists.

The `metadata.clusterName` field is completely removed. This should not have any user-visible impact.
lavalamp
lavalamp

OK I think everything should be right now.

Activity icon
issue

lavalamp issue comment kubernetes/kubernetes

lavalamp
lavalamp

Finish clustername removal

This completes the work started in #108717. It should not merge until merges reopen for 1.25.

The compatibility test will fail until I run the command to update it, which I can't do until the tag for 1.24.0 exists.

The `metadata.clusterName` field is completely removed. This should not have any user-visible impact.
lavalamp
lavalamp

Actually, wait, I probably have clobbered the base files and should double check that

Activity icon
issue

lavalamp issue comment kubernetes/kubernetes

lavalamp
lavalamp

Finish clustername removal

This completes the work started in #108717. It should not merge until merges reopen for 1.25.

The compatibility test will fail until I run the command to update it, which I can't do until the tag for 1.24.0 exists.

The `metadata.clusterName` field is completely removed. This should not have any user-visible impact.
lavalamp
lavalamp
push

lavalamp push lavalamp/kubernetes

lavalamp
lavalamp

Promote e2e job lifecycle test to Conformance

lavalamp
lavalamp

Reduce number of repetitions and pods in TestPreemptionRaces

Change-Id: Id2c73be7be2536b02c804978d26d1e977a344399

lavalamp
lavalamp

Skip adding data to avoid "json: unsupported value: NaN" panic when data is NaN

lavalamp
lavalamp

Merge pull request #109534 from ii/promote-job-lifecycle-test

Promote Batchv1JobLifecycleTest +4 Endpoints

lavalamp
lavalamp

Merge pull request #109825 from alculquicondor/reduce-tests

Reduce number of repetitions and pods in TestPreemptionRaces

lavalamp
lavalamp

Merge pull request #109545 from sanposhiho/fix-nun-on-scheduler_perf

Skip adding data to avoid "json: unsupported value: NaN" panic when data is NaN

commit sha: 211ee288ee1f323af5cdd93e673a4612dcc313b1

push time in 2 weeks ago
May
5
2 weeks ago
Activity icon
issue

lavalamp issue comment kubernetes/kubernetes

lavalamp
lavalamp

Finish clustername removal

This completes the work started in #108717. It should not merge until merges reopen for 1.25.

The compatibility test will fail until I run the command to update it, which I can't do until the tag for 1.24.0 exists.

The `metadata.clusterName` field is completely removed. This should not have any user-visible impact.
lavalamp
lavalamp

That's very weird, I'll run it again...

Activity icon
issue

lavalamp issue comment kubernetes/kubernetes

lavalamp
lavalamp

Finish clustername removal

This completes the work started in #108717. It should not merge until merges reopen for 1.25.

The compatibility test will fail until I run the command to update it, which I can't do until the tag for 1.24.0 exists.

The `metadata.clusterName` field is completely removed. This should not have any user-visible impact.
push

lavalamp push lavalamp/kubernetes

lavalamp
lavalamp

Only log requests for configured consumptions in ResourceConsumer

lavalamp
lavalamp

Expand unit tests of pruning of unknown fields in metadata

lavalamp
lavalamp

Expand cmd tests of modifying schema-declaring custom resources

lavalamp
lavalamp

Fix bug treating metadata fields as unknown fields

lavalamp
lavalamp

Winkernel proxier cache HNS data to improve syncProxyRules performance

Resolved issues with proxy rules taking a long time to be synced on Windows, by caching HNS data.

In particular, the following HNS data will be cached for the context of syncProxyRules:

  • HNS endpoints

  • HNS load balancers

lavalamp
lavalamp

Drop unused golang/template package

lavalamp
lavalamp

Drop unused golang/template funcs

lavalamp
lavalamp
lavalamp
lavalamp

fix e2e coverage package for go 1.18

lavalamp
lavalamp

Update Metrics doc as there is a typo in package

Package header typo is very visible looking at docs.

https://pkg.go.dev/k8s.io/metrics/pkg/apis/metrics

lavalamp
lavalamp

Optimize test cases for ipvs

lavalamp
lavalamp

Optimize test cases for iptables

lavalamp
lavalamp

kubeadm: replace *clientset.Clientset with clientset.Interface for join phase

lavalamp
lavalamp
lavalamp
lavalamp

hardens integration serviceaccount tests

the serviceAccountController controller used by the tests must wait for the caches to sync since the tests don't check /readyz there is no way the tests can tell it is safe to call the server and requests won't be rejected

lavalamp
lavalamp

Minor cleanup to use t.Run() in test/integration

lavalamp
lavalamp

spdyroundrippter: close the connection if tls handshake fails

lavalamp
lavalamp

spdyroundtripper: d tlson't verify hostname twice

lavalamp
lavalamp

e2e: node: explicit skip for device plugin tests

The device plugin e2e tests where failing lately and to unblock the release a skip was added in the prow job configuration: https://github.com/kubernetes/test-infra/blob/71cf119c846b21f8fc37ab7bac00899a80ce9bea/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml#L401

The problem here is not only the broken test which need to be fixed, but also the fact that this is the only skip (for a specific test) we do this way, which is surprising (xref: https://github.com/kubernetes/kubernetes/issues/106635#issuecomment-1105627265)

As next step towards improvement, we add an explicit skip in the tests proper. This makes at least more obvious these tests need more work, and allow us to remove the edge case in the prow configuration.

Signed-off-by: Francesco Romani [email protected]

lavalamp
lavalamp

honor the framework delete timeout for pv

commit sha: 941b37bfa0709f47428e9cd1741f3ddcc5e5c8ec

push time in 2 weeks ago
Activity icon
issue

lavalamp issue comment kubernetes/kubernetes

lavalamp
lavalamp

Change apiserver healthiness check in KCM

What happened?

KCM startup has a dependency on apiserver availability https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/controller-manager/app/helper.go#L37.

When we have an apiserver that uses kms keys, an error in the kms plugin, the kms service, or a customer mistake of deleting the key interrupt the kube-controller-manager startup because of the dependency on healthz.

What did you expect to happen?

Would it be ok to introduce a flag on the kube-controller-manager options to exclude certain healthz checks? In such a case, kcm would be able to start up even when kms has an issue. Not all the controller loops will be able to startup in that situation, but it might be better to have the kcm be able to start and try to reconcile the loops.

How can we reproduce it (as minimally and precisely as possible)?

Create an EKS cluster using kms enabled.

aws eks 
    --role-arn arn:aws:iam::xxx:role/xxx \
    --resources-vpc-config subnetIds=subnet-xxx,subnet-xxx,securityGroupIds=sg-xxx \
    --kubernetes-version 1.19 --encryption-config resources=secrets,provider={keyArn=arn:aws:kms:us-west-2:xxx:key/xxxx}

Disable the kms key. The kube controller manager becomes unhealthy

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Cloud provider

EKS

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

lavalamp
lavalamp
  • most KCM installs just have it talking to the local apiserver. If that apiserver is unhealthy but others are healthy, it makes sense to give up the lock (you might not even have a choice depending on how unhealthy apiserver is).
  • If for some reason KCM is going through a load balancer to get to apiserver, then it doesn't really make sense to try and lose the lock, since you're probably not talking to a specific apiserver and no other KCM would do better anyway.
  • In this particular case, since KMS health requires user action, what I think you really want is to globally exclude it from apiserver health checks, not just for KCM.
  • In the abstract, I'm not in favor of adding knobs to KCM, due to my first two points implying that different behavior is needed in different setups, which implies yet more flags... Let's not start down this road until it's certain we need it.

I'm not sure how I feel about the use case, I think a better solution might be to somehow lock the key so the user can't delete it while it's in use, only rotate it.

May
4
2 weeks ago
Activity icon
issue

lavalamp issue comment kubernetes/kubernetes

lavalamp
lavalamp

Finish clustername removal

This completes the work started in #108717. It should not merge until merges reopen for 1.25.

The compatibility test will fail until I run the command to update it, which I can't do until the tag for 1.24.0 exists.

The `clusterName` field is completely removed. This should not have any user-visible impact.
lavalamp
lavalamp

/assign @liggitt @deads2k

I think this is ready now

push

lavalamp push lavalamp/kubernetes

lavalamp
lavalamp

Fix misspelling of success.

Signed-off-by: JunYang [email protected]

lavalamp
lavalamp

Added --sum flag to kubectl top pod

lavalamp
lavalamp

Add missing test cases for RunAsGroup and SetRunAsGroup methods

lavalamp
lavalamp
lavalamp
lavalamp

fix comment of e2e test case garbage_collector

Signed-off-by: sayaoailun [email protected]

lavalamp
lavalamp

Replace dbus-send with godbus for fake PrepareForShutdown message

lavalamp
lavalamp

refactor: Change the users of IsQualifiedName to ValidateQualifiedName

lavalamp
lavalamp

Add pod status info log for e2e creating pods

lavalamp
lavalamp

kube-controller-manager: Remove the deprecated --experimental-cluster-signing-duration flag

Signed-off-by: ialidzhikov [email protected]

lavalamp
lavalamp

Improvement: Updated the serviceaccount flag for multiple subjects.

lavalamp
lavalamp

e2e/cleanup: fix package name and dir name mismatches

lavalamp
lavalamp

pkg/volume: fix incorrect klog.Infof usage

klog.Infof expects a format string as first parameter and then expands format specifies inside it. What gets passed here is the final string that must be logged as-is, therefore klog.Info has to be used.

Signed-off-by: yuswift [email protected]

lavalamp
lavalamp

fix: exclude non-ready nodes and deleted nodes from azure load balancers

Make sure that nodes that are not in the ready state and are not newly created (i.e. not having the "node.cloudprovider.kubernetes.io/uninitialized" taint) get removed from load balancers. Also remove nodes that are being deleted from the cluster.

Signed-off-by: Riccardo Ravaioli [email protected]

lavalamp
lavalamp

kubelet: fix panic triggered when playing with a wip CRI

lavalamp
lavalamp

Update rs.extensions to rs.apps

lavalamp
lavalamp

kubelet: more resilient node allocatable ephemeral-storage data getter

lavalamp
lavalamp

Updated the user and group flag.

lavalamp
lavalamp

cleanUp:check existence using basic method of set

lavalamp
lavalamp

cpu manager policy set to none, no one remove container id from container map, lead memory leak

commit sha: e8f24a88e1f37c77ee9596cd099752d30f20be76

push time in 2 weeks ago
push

lavalamp push lavalamp/kubernetes

lavalamp
lavalamp

Disable Intree GCE PD tests by default

Signed-off-by: Davanum Srinivas [email protected]

lavalamp
lavalamp

windows GCE: Bumps containerd version to 1.6.2

containerd v1.6.0 introduced HostProcessContainers support [1], which are required for e2e tests that need that feature.

This addresses some of the permafailing tests for Windows GCE E2E test runs.

[1] https://github.com/containerd/containerd/pull/5131

lavalamp
lavalamp

Merge pull request #109541 from dims/disable-intree-gce-pd-tests-by-default

Disable Intree GCE PD tests by default

lavalamp
lavalamp

Merge pull request #109592 from claudiubelu/gce-updates-containerd-version

windows GCE: Bumps containerd version to 1.6.2

lavalamp
lavalamp

Bump cAdvisor to v0.44.1

Bump cAdvisor to v0.44.1 to pick up fix for containerd task timeout which resulted in empty network metrics.

Signed-off-by: David Porter [email protected]

lavalamp
lavalamp

Merge pull request #109658 from bobbypage/cadvisor-044-1

Bump cAdvisor to v0.44.1

lavalamp
lavalamp

CHANGELOG: Update directory for v1.24.0-rc.1 release

lavalamp
lavalamp

fix: NeedResize build failure on Windows

lavalamp
lavalamp

Do not wrap lines if we can't read term size

lavalamp
lavalamp

Merge pull request #109722 from soltysh/fix_templater

Do not wrap lines if we can't read term size

lavalamp
lavalamp

Merge pull request #109721 from andyzhangx/needresize-windows

fix: NeedResize build failure on Windows

lavalamp
lavalamp

CHANGELOG: Update directory for v1.24.0 release

lavalamp
lavalamp

commit sha: 35daf45fda76bc97b57c47c118401f317d765f6d

push time in 2 weeks ago
Previous