tklauser

tklauser

Member Since 11 years ago

@isovalent / @cilium, Switzerland

Experience Points
306
follower
Lessons Completed
132
follow
Lessons Completed
276
stars
Best Reply Awards
197
repos

2449 contributions in the last year

Pinned
⚡ eBPF-based Networking, Security, and Observability
⚡ The Go programming language
⚡ [mirror] Go packages for low-level interaction with the operating system
⚡ Linux kernel source tree
⚡ A Swiss army knife for your daily Linux network plumbing.
⚡ Link-Local Multicast Resolution (LLMNR) Daemon for Linux
Activity
Jan
26
23 hours ago
Activity icon
created branch

tklauser in tklauser/cilium-dev-vagrant create branch multi-homing

createdAt 5 hours ago
Activity icon
delete

tklauser in tklauser/plugins delete branch x-sys-unix-setns

deleted time in 5 hours ago
Activity icon
delete

tklauser in tklauser/plugins delete branch x-sys-unix-const

deleted time in 5 hours ago
pull request

tklauser merge to cilium/cilium

tklauser
tklauser

vendor: pull in the latest changes from github.com/vishvananda/netlink

Includes fix for broken compilation on mac OS - https://github.com/vishvananda/netlink/pull/730 while running ginkgo tests locally.

Activity icon
delete

tklauser in cilium/cilium-cli delete branch pr/tklauser/update-digests

deleted time in 7 hours ago
pull request

tklauser pull request cilium/cilium-cli

tklauser
tklauser

Add digests for Cilium 1.9.12, 1.10.7 and service mesh beta images

Also update cmd/internal/add-image-digests to use crane to get image digests without having to pull them.

See individual commits for details.

push

tklauser push cilium/cilium-cli

tklauser
tklauser

cmd/internal/add-image-digests: use crane to get digests without pulling

Run crane [1] in a container to get the image digests without having to pull the image and parse the output of docker pull.

[1] https://github.com/google/go-containerregistry/tree/main/cmd/crane

Signed-off-by: Tobias Klauser [email protected]

tklauser
tklauser

defaults: add image digests for Cilium 1.9.12 and 1.10.7

Signed-off-by: Tobias Klauser [email protected]

tklauser
tklauser

defaults: add digests for Cilium service mesh beta images

Generated using

go run ./cmd/internal/add-image-digests cilium-service-mesh v1.11.0-beta.1

Suggested-by: Tom Payne [email protected] Signed-off-by: Tobias Klauser [email protected]

commit sha: d578df7649b86e8f7f0bc3f88977fe31cf144845

push time in 7 hours ago
pull request

tklauser pull request cilium/cilium-cli

tklauser
tklauser

Add digests for Cilium 1.9.12, 1.10.7 and service mesh beta images

Also update cmd/internal/add-image-digests to use crane to get image digests without having to pull them.

See individual commits for details.

push

tklauser push cilium/cilium-cli

tklauser
tklauser

defaults: add digests for Cilium service mesh beta images

Generated using

go run ./cmd/internal/add-image-digests cilium-service-mesh v1.11.0-beta.1

Suggested-by: Tom Payne [email protected] Signed-off-by: Tobias Klauser [email protected]

commit sha: 7ea942bdae54894f58ac6493ee372a0a26ab6b28

push time in 8 hours ago
Activity icon
created branch

tklauser in cilium/cilium-cli create branch pr/tklauser/update-digests

createdAt 8 hours ago
pull request

tklauser merge to cilium/cilium

tklauser
tklauser

v1.11 backports 2022-01-26

Once this PR is merged, you can update the PR labels via:

$ for pr in 18483 18538 18546 18499 18484 18479 18592 18564 18606 18582 18554 18553 18469; do contrib/backporting/set-labels.py $pr done 1.11; done

Conflicts

tklauser
tklauser

Thanks, looks good for my change.

Activity icon
issue

tklauser issue comment cilium/cilium-service-mesh-beta

tklauser
tklauser

INSTALLATION.md: use correct Hubble image

tklauser
tklauser

This used to work with cilium-cli v0.10.0 without specifying --relay-version, so this is really a regression in cilium-cli v0.10.1.

Not sure whether we still want to merge this PR as a temporary solution to prevent service mesh beta users from hitting issues until we've fixed cilium-cli and released a new version?

Activity icon
delete

tklauser in tklauser/procfs delete branch xfrm-stats-godoc

deleted time in 11 hours ago
Activity icon
issue

tklauser issue comment cilium/cilium-service-mesh-beta

tklauser
tklauser

INSTALLATION.md: use correct Hubble image

tklauser
tklauser

Thanks @youssefazrak! Now updated the cilium hubble enable --ui command as well.

push

tklauser push cilium/cilium-service-mesh-beta

tklauser
tklauser

INSTALLATION.md: use correct Hubble image

Signed-off-by: Tobias Klauser [email protected]

commit sha: 619aa5e069a9a453de84afe553c6f19d7ef5ef97

push time in 13 hours ago
Activity icon
created branch

tklauser in tklauser/cilium create branch pr/test-runtime-wait-endpoints-ready

createdAt 13 hours ago
Activity icon
issue

tklauser issue comment cilium/cilium

tklauser
tklauser

CI: RuntimePolicies Init Policy Default Drop Test tests egress

Test Name

RuntimePolicies Init Policy Default Drop Test tests egress

Failure Output

FAIL: Expected endpoint ID to exist for initContainer

Stacktrace

Click to show.
/home/jenkins/workspace/Cilium-PR-Runtime-net-next/runtime-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:527
Expected endpoint ID to exist for initContainer
Expected
    <bool>: false
to be true
/home/jenkins/workspace/Cilium-PR-Runtime-net-next/runtime-gopath/src/github.com/cilium/cilium/test/runtime/Policies.go:1468

Standard Output

Click to show.
Number of "context deadline exceeded" in logs: 0
Number of "level=error" in logs: 0
Number of "level=warning" in logs: 0
Number of "Cilium API handler panicked" in logs: 0
Number of "Goroutine took lock for more than" in logs: 1
No errors/warnings found in logs


Standard Error

Click to show.
08:55:42 STEP: Running BeforeEach block for EntireTestsuite RuntimePolicies
08:55:42 STEP: Setting PolicyEnforcement=default
08:55:45 STEP: Running BeforeEach block for EntireTestsuite RuntimePolicies Init Policy Default Drop Test
08:55:45 STEP: Setting PolicyEnforcement=always
08:55:47 STEP: Starting hubble observe in background
08:55:47 STEP: Creating an endpoint
FAIL: Expected endpoint ID to exist for initContainer
Expected
    <bool>: false
to be true
=== Test Finished at 2022-01-26T08:55:48Z====
08:55:48 STEP: Running JustAfterEach block for EntireTestsuite RuntimePolicies
===================== TEST FAILED =====================
08:55:48 STEP: Running AfterFailed block for EntireTestsuite RuntimePolicies
cmd: sudo cilium endpoint list
Exitcode: 0 
Stdout:
 	 ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])   IPv6                 IPv4            STATUS   
	            ENFORCEMENT        ENFORCEMENT                                                                                     
	 66         Enabled            Enabled           23443      container:id.httpd2           f00d::a0f:0:0:4fe2   10.15.249.173   ready                   
	                                                            container:id.service1                                                                      
	 251        Enabled            Enabled           13366      container:id.app3             f00d::a0f:0:0:88d6   10.15.23.74     ready                   
	 270        Enabled            Enabled           4          reserved:health               f00d::a0f:0:0:be7a   10.15.19.173    ready                   
	 375        Enabled            Enabled           45787      container:somelabel           f00d::a0f:0:0:dc1e   10.15.120.16    waiting-to-regenerate   
	 428        Enabled            Enabled           27738      container:id.httpd3           f00d::a0f:0:0:9638   10.15.34.217    ready                   
	                                                            container:id.service1                                                                      
	 997        Enabled            Enabled           27302      container:id.httpd1           f00d::a0f:0:0:f6a3   10.15.184.80    ready                   
	                                                            container:id.service1                                                                      
	 3406       Disabled           Disabled          1          reserved:host                                                      ready                   
	 3525       Enabled            Enabled           47013      container:id.app1             f00d::a0f:0:0:dde5   10.15.11.223    ready                   
	 3926       Enabled            Enabled           48313      container:id.app2             f00d::a0f:0:0:57ad   10.15.219.72    ready                   
	 
Stderr:
 	 

===================== Exiting AfterFailed =====================
08:56:03 STEP: Running AfterEach for block EntireTestsuite RuntimePolicies Init Policy Default Drop Test
08:56:03 STEP: Running AfterEach for block EntireTestsuite RuntimePolicies
08:56:03 STEP: Running AfterEach for block EntireTestsuite

[[ATTACHMENT|0d3fbf76_RuntimePolicies_Init_Policy_Default_Drop_Test_tests_egress.zip]]


ZIP Links:

Click to show.

https://jenkins.cilium.io/job/Cilium-PR-Runtime-net-next//1138/artifact/0d3fbf76_RuntimePolicies_Init_Policy_Default_Drop_Test_tests_egress.zip https://jenkins.cilium.io/job/Cilium-PR-Runtime-net-next//1138/artifact/test_results_Cilium-PR-Runtime-net-next_1138_BDD-Test-PR.zip

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-Runtime-net-next/1138/

If this is a duplicate of an existing flake, comment 'Duplicate of #<issue-number>' and close this issue.

tklauser
tklauser

Looks like the endpoint in question is still in waiting-to-regenerate state which might indicate the endpoint wasn't created yet at the time of the check:

[2022-01-26T08:55:49.437Z] FAIL: Expected endpoint ID to exist for initContainer
[2022-01-26T08:55:49.437Z] Expected
[2022-01-26T08:55:49.437Z]     <bool>: false
[2022-01-26T08:55:49.437Z] to be true
[2022-01-26T08:55:49.437Z] === Test Finished at 2022-01-26T08:55:48Z====
[2022-01-26T08:55:49.437Z] 08:55:48 STEP: Running JustAfterEach block for EntireTestsuite RuntimePolicies
[2022-01-26T08:55:49.437Z] ===================== TEST FAILED =====================
[2022-01-26T08:55:49.437Z] 08:55:48 STEP: Running AfterFailed block for EntireTestsuite RuntimePolicies
[2022-01-26T08:55:49.437Z] cmd: sudo cilium endpoint list
[2022-01-26T08:55:49.437Z] Exitcode: 0 
[2022-01-26T08:55:49.437Z] Stdout:
[2022-01-26T08:55:49.437Z]  	 ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])   IPv6                 IPv4            STATUS   
[2022-01-26T08:55:49.437Z] 	            ENFORCEMENT        ENFORCEMENT                                                                                     
[2022-01-26T08:55:49.437Z] 	 66         Enabled            Enabled           23443      container:id.httpd2           f00d::a0f:0:0:4fe2   10.15.249.173   ready                   
[2022-01-26T08:55:49.437Z] 	                                                            container:id.service1                                                                      
[2022-01-26T08:55:49.437Z] 	 251        Enabled            Enabled           13366      container:id.app3             f00d::a0f:0:0:88d6   10.15.23.74     ready                   
[2022-01-26T08:55:49.437Z] 	 270        Enabled            Enabled           4          reserved:health               f00d::a0f:0:0:be7a   10.15.19.173    ready                   
[2022-01-26T08:55:49.437Z] 	 375        Enabled            Enabled           45787      container:somelabel           f00d::a0f:0:0:dc1e   10.15.120.16    waiting-to-regenerate   
[2022-01-26T08:55:49.437Z] 	 428        Enabled            Enabled           27738      container:id.httpd3           f00d::a0f:0:0:9638   10.15.34.217    ready                   
[2022-01-26T08:55:49.437Z] 	                                                            container:id.service1                                                                      
[2022-01-26T08:55:49.437Z] 	 997        Enabled            Enabled           27302      container:id.httpd1           f00d::a0f:0:0:f6a3   10.15.184.80    ready                   
[2022-01-26T08:55:49.437Z] 	                                                            container:id.service1                                                                      
[2022-01-26T08:55:49.437Z] 	 3406       Disabled           Disabled          1          reserved:host                                                      ready                   
[2022-01-26T08:55:49.437Z] 	 3525       Enabled            Enabled           47013      container:id.app1             f00d::a0f:0:0:dde5   10.15.11.223    ready                   
[2022-01-26T08:55:49.437Z] 	 3926       Enabled            Enabled           48313      container:id.app2             f00d::a0f:0:0:57ad   10.15.219.72    ready     

We probably need to check vm.WaitEndpointsReady() before attempting to check the endpoint ID here:

https://github.com/cilium/cilium/blob/8789c951ae608610888d8c69c968c80b74ec59cb/test/runtime/Policies.go#L1461-L1464

(and in other places where a similar check is performed)

Activity icon
issue

tklauser issue comment cilium/cilium

tklauser
tklauser

daemon, option: consistently hard-code host device

The host device name is currently stored in option.Config.HostDevice which is always initialized to defaults.HostDevice and there is no command line option or other method to change it. Remove the option var and always consistently hard-code the host device name using defaults.HostDevice, like already done in several other places in the code base.

Activity icon
issue

tklauser issue comment cilium/cilium

tklauser
tklauser

daemon, option: consistently hard-code host device

The host device name is currently stored in option.Config.HostDevice which is always initialized to defaults.HostDevice and there is no command line option or other method to change it. Remove the option var and always consistently hard-code the host device name using defaults.HostDevice, like already done in several other places in the code base.

tklauser
tklauser

/mlh new-flake Cilium-PR-Runtime-net-next

pull request

tklauser pull request cilium/cilium-service-mesh-beta

tklauser
tklauser

INSTALLATION.md: use correct Hubble image

Activity icon
created branch

tklauser in cilium/cilium-service-mesh-beta create branch hubble-enable-version

createdAt 14 hours ago
Activity icon
issue

tklauser issue comment cilium/cilium

tklauser
tklauser

daemon, option: consistently hard-code host device

The host device name is currently stored in option.Config.HostDevice which is always initialized to defaults.HostDevice and there is no command line option or other method to change it. Remove the option var and always consistently hard-code the host device name using defaults.HostDevice, like already done in several other places in the code base.

pull request

tklauser merge to cilium/cilium

tklauser
tklauser

vendor: pull in the latest changes from github.com/vishvananda/netlink

Includes fix for broken compilation on mac OS - https://github.com/vishvananda/netlink/pull/730 while running ginkgo tests locally.

tklauser
tklauser

The Go vendoring check failed: https://github.com/cilium/cilium/runs/4948111563?check_suite_focus=true

I think you need to git add vendor/github.com/vishvananda/netlink/proc_event_linux.go.

push

tklauser push cilium/cilium

tklauser
tklauser

datapath: Do not create CT_INGRESS for NodePort traffic

Previously, the NodePort BPF used to create CT_INGRESS entries for client => selected LB backend tuple in the nodeport_lb{4,6}() functions.

The flow below illustrates why it was needed.

-> TCP SYN:

nodeport_lb4()@host ct_create4(CT_EGRESS): key = tuple=(saddr=pod,daddr=client,flags=TUPLE_F_OUT, dport=80,sport=client_port) <-- ENTRY_EGRESS val = entry=(rev_nat_id=svc_id,node_port=1) ct_create4(CT_INGRESS): key = tuple=(saddr=pod,daddr=client,flags=TUPLE_F_IN, dport=80,sport=client_port) <-- ENTRY_INGRESS val = entry=(rev_nat_id=0,node_port=1)

ipv4_policy()@lxc ct_lookup4(CT_INGRESS): 1. tuple=(saddr=client,daddr=pod,flags=TUPLE_F_OUT, dport=client_port,sport=80) nothing found 2. tuple=(saddr=pod,daddr=client,flags=TUPLE_F_IN, dport=80,sport=client_port) finds ENTRY_INGRESS

<- SYN-ACK:

handle_ipv4_from_lxc()@lxc ct_lookup4(CT_EGRESS): 1. tuple=(saddr=pod,daddr=client,flags=TUPLE_F_IN, dport=80,sport=client_port) finds ENTRY_INGRESS and tail calls due to node_port=1 to do rev NodePort translation

rev_nodeport_lb4()@lxc (via the tail call) ct_lookup4(CT_INGRESS): 1. tuple=(saddr=pod,daddr=client,flags=TUPLE_F_OUT, dport=80,sport=client_port) finds ENTRY_EGRESS and does the rev NodePort xlation

The ENTRY_INGRESS entry was needed to indicate the fact that the packet belongs to the NodePort flow, while ENTRY_EGRESS to store all necessary information for the reverse NodePort xlation.

Unfortunately, creating the ENTRY_INGRESS beforehand in the NodePort BPF made the bpf_lxc to think that the TCP SYN packet belongs to already established connection (CT_ESTABLISHED), the monitor aggregator to ignore the packet when its aggregation level was set > "none" and screws up the packet stats accounting in the CT.

To fix this, derive whether the packet belongs to the NodePort flow by querying CT_EGRESS instead of creating CT_INGRESS.

Signed-off-by: Sebastian Wicki [email protected] Signed-off-by: Martynas Pumputis [email protected]

tklauser
tklauser

hubble: Add special case for TO_NETWORK is_reply field

The newly introduced NodePort reply path TO_NETWORK trace point does provide connection tracking state to userspace. We can therefore safely determine the value of is_reply if we detect it being non-zero.

Signed-off-by: Sebastian Wicki [email protected]

tklauser
tklauser

datapath: Emit trace for NodePort reply

Previously, when --monitor-aggregation was set to > "none", no trace events were emitted for the NodePort BPF replies coming from local backends. This made the replies invisible in cilium monitor / hubble output.

Signed-off-by: Sebastian Wicki [email protected] Signed-off-by: Martynas Pumputis [email protected]

tklauser
tklauser

Revert "datapath: Remove !CONNTRACK"

This reverts commit 21898cafc01cef5feb51c767e50d13339a7cac75.

Rationale:

  • The change creates a consistent failure in test RuntimeConntrackInVethModeTest Conntrack-related configuration options for endpoints from the runtime pipeline.
  • It is likely the test needs to be modified to remove now obsolete checks following the reverted changes.

[ revert note: manually resolved conflicts in conntrack.h due to e9b6d39e34c929f4e61f88ce9584b5f5e0a27077 being merged in master ]

Signed-off-by: Nicolas Busseneau [email protected]

tklauser
tklauser

Update stable releases

Signed-off-by: Joe Stringer [email protected]

tklauser
tklauser

ci: disable failing test on net-next (#18520)

K8sPolicyTest Multi-node policy test validates fromEntities policies Validates fromEntities all policy has been failing consistently on net-next testing pipelines after we upgraded the net-next Vagrant VM in 8bf4e228693c87ab50227245c403d179f418de56.

We disable the test until it is fixed (tracked in #18520).

Signed-off-by: Nicolas Busseneau [email protected]

tklauser
tklauser

build(deps): bump github/codeql-action from 1.0.27 to 1.0.28

Bumps github/codeql-action from 1.0.27 to 1.0.28.


updated-dependencies:

  • dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ...

Signed-off-by: dependabot[bot] [email protected]

tklauser
tklauser

build(deps): bump aws-actions/configure-aws-credentials

Bumps aws-actions/configure-aws-credentials from 1.6.0 to 1.6.1.


updated-dependencies:

  • dependency-name: aws-actions/configure-aws-credentials dependency-type: direct:production update-type: version-update:semver-patch ...

Signed-off-by: dependabot[bot] [email protected]

tklauser
tklauser

build(deps): bump docker/build-push-action from 2.7.0 to 2.8.0

Bumps docker/build-push-action from 2.7.0 to 2.8.0.


updated-dependencies:

  • dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-minor ...

Signed-off-by: dependabot[bot] [email protected]

tklauser
tklauser

contrib: Fix release script helm value generation

Commit 2bede0336208 ("release: Generate helm values docs") broke the release scripts for branches v1.10 and earlier, as it introduced a dependency on a new Makefile target that has not been backported. Grep for the dependency and skip it silently if it's not available.

Fixes: 2bede0336208 ("release: Generate helm values docs") Signed-off-by: Joe Stringer [email protected]

tklauser
tklauser

daemon: deprecate --endpoint-interface-name-prefix option

The option doesn't have any effect since commit 8a039e9e1c39 ("iptables: Don't match device on egress proxy rules"). Mark it as deprecated and remove it for 1.13.

Signed-off-by: Tobias Klauser [email protected]

tklauser
tklauser

docs: Split kernel requirements in sections

Signed-off-by: Paul Chaignon [email protected]

tklauser
tklauser

docs: Kernel requirements for advanced features

This commit adds the kernel requirements for advanced features such as IPsec, iptables-based masquerading, and the bandwidth manager. Those kernel configuration options were determined by making a somewhat-minimal kernel configuration for our CI.

Signed-off-by: Paul Chaignon [email protected]

tklauser
tklauser

daemon: Fix missing errors in KPR init

The db5300dc0f commit normalized the KPR initialisation routines by making them to return an error. Unfortunately, in some error returns it forgot to add actual errors. This makes debugging of KPR init difficult. For instance:

"Detected devices" devices="[enp0s9 eth0]"
<..>
"failed to finalise LB initialization: Cannot retrieve enp0s9 link"

The actual error for the link retrieval is missing.

Fixes: db5300dc0f ("choir: normalize error handling in kube_proxy_replacement.go") Signed-off-by: Martynas Pumputis [email protected]

tklauser
tklauser

bpf: Move to-stack trace after host firewall

When host firewall feature is enable, to-stack trace can be generated even if the packets are rejected by host firewall policies. Generate trace only when the packets are actually delivered to the stack.

Fixes: #12562 Signed-off-by: Yutaro Hayakawa [email protected]

tklauser
tklauser

monitor: Output non-trace messages to stderr

When we use JSON output of cilium monitor, it would be useful if we could filter the output by the tools like JQ. However, since some of the non-JSON messages are generated through stdout, it ends up to the JSON parse error. Fix cilium monitor command to generate non-JSON messages to stderr.

Signed-off-by: Yutaro Hayakawa [email protected]

tklauser
tklauser

bpf: Fix missing metric updates on from-{stack,host}

Function update_trace_metrics updates metrics (number of bytes and packets) at various observation points. Observation points TRACE_FROM_HOST and TRACE_FROM_STACK were however missing from the list.

This oversight was identified thanks to the new enumeration trace_point introduced in the next commit.

Signed-off-by: Paul Chaignon [email protected]

tklauser
tklauser

bpf: Define enum trace_point

Define a trace_point enumeration to hold the TRACE_{FROM,TO}_XXX constants. Doing so will allow us to rely on the compiler to expose errors in subsequent commits.

Because of this new enumeration, we also need to define all cases explicitly in the update_trace_metrics switch. Otherwise, the Clang errors with the Wswitch warning.

This commit should include no functional changes.

Signed-off-by: Paul Chaignon [email protected]

tklauser
tklauser

bpf: Define enum nat_dir

Define a nat_dir enumeration to hold the NAT_DIR_{IN,E}GRESS constants. Doing so will allow to rely on the compiler to expose errors in subsequent commits.

This commit should include no functional changes.

Signed-off-by: Paul Chaignon [email protected]

tklauser
tklauser

bpf: Define enum ct_dir and metric_dir

Define ct_dir and metric_dir enumerations to hold the {CT,METRIC}_{INGRESS,EGRESS,SERVICE} constants. Doing so will allow to rely on the compiler to expose errors, notably if one tries to use a ct_dir enum as a metric_dir enum without conversion.

This commit should include no functional changes.

Signed-off-by: Paul Chaignon [email protected]

commit sha: 38eea2b62a4fdfb304ce08fbbb33172b56632af8

push time in 15 hours ago
Activity icon
delete

tklauser in tklauser/cilium delete branch pr/hostdevice-cleanup

deleted time in 15 hours ago
Activity icon
created branch

tklauser in tklauser/cilium create branch pr/hostdevice-cleanup

createdAt 15 hours ago
Activity icon
issue

tklauser issue comment cilium/cilium

tklauser
tklauser

pkg/datapath: Remove transitive dependency on netlink

See commit message.

pull request

tklauser merge to cilium/cilium

tklauser
tklauser

pkg/datapath: Remove transitive dependency on netlink

See commit message.

Previous