-
Notifications
You must be signed in to change notification settings - Fork 634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Cilium] Executing nerdctl run in k8 environment is stuck #3783
Comments
[root@m1 ~]# kubectl version |
@wzxmt I am not sure how to reproduce your problem. Against a kind cluster, things are working just fine / as expected. I need more details about your specific deployment.
|
My K8s deployment method uses binary deployment, and I tried again. Running "nerdctl run --name test --rm -it busybox:1.28 /bin/sh" in Flannel mode works without any stutter, but it stutters in Cilium mode. Here are the deployment modes: linux-amd64/helm template cilium cilium/cilium --version 1.15.11 [root@m1 ~]# containerd -v [root@m1 ~]# nerdctl info Server: [root@m1 ~]# nerdctl run --name test --rm -it --debug-full busybox:1.28 /bin/sh |
Thanks @wzxmt What happens with @AkihiroSuda anyone around familiar with Kube + eBPF/Cillium who could help debug this? |
I later tried the Calico mode and it worked fine. Running "nerdctl network ls" in Cilium mode still stutters, but other modes can be executed normally. flannel[root@m2 ~]# nerdctl network ls calico[root@m3 ~]# nerdctl network ls Cilium stutters[root@m1 ~]# nerdctl network ls |
Interesting. Staying stuck is rather unusual. @wzxmt if you feel like it, the most helpful thing you could do is: # clone nerdctl source code
git clone [email protected]:containerd/nerdctl.git
cd nerdctl
# Edit https://github.com/containerd/nerdctl/blob/main/pkg/netutil/netutil.go#L224
# Line 224, find this:
# err = lockutil.WithDirLock(e.NetconfPath, fn)
# Replace it with:
# fn()
# Compile a new nerdctl binary
make binaries
# The updated binary is under `_output`
# Now, try again
_output/nerdctl network ls
If it still does not help, you could pepper I wish I could test Cilium but I am short on time right now. Thanks @wzxmt |
Edit https://github.com/containerd/nerdctl/blob/main/pkg/netutil/netutil.go#L224,make binaries |
Thanks a lot @wzxmt I think this confirms what the issue is: cilium is very likely trying to lock the same directory as nerdctl (likely the cni configuration directory). The problem here will not be trivial to solve. We need to flock when accessing the cni conf - this is the only way to prevent racy/concurrent modifications. What we could do is move the lock to a different location though (purely nerdctl). cc @AkihiroSuda |
Description
Executing nerdctl run in the k8 environment is stuck, but k8s can create pods normally
Steps to reproduce the issue
1.[root@m1 ~]# nerdctl ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f0571a9094ce quay.io/cilium/hubble-ui-backend@sha256:0e0eed917653441fded4e7cdb096b7be6a3bddded5a2dd10812a27b1fc6ed95b "/usr/bin/backend" 6 minutes ago Up k8s://kube-cilium/hubble-ui-77555d5dcf-pj77v/backend
046ba04231f7 docker.io/wangyanglinux/myapp:v1 "nginx -g daemon off;" 6 minutes ago Up k8s://default/test-z2gms/test
5c6c52541c37 docker.io/wzxmtlw/metrics-server:v0.6.3 "/metrics-server --c…" 6 minutes ago Up k8s://kube-system/metrics-server-5c7b6df7d8-md58r/metrics-server
fcb24a33d77a quay.io/cilium/hubble-relay@sha256:d352d3860707e8d734a0b185ff69e30b3ffd630a7ec06ba6a4402bed64b4456c "hubble-relay serve" 7 minutes ago Up k8s://kube-cilium/hubble-relay-7bc7544857-95dqm/hubble-relay
....
2.[root@m1 ~]# nerdctl run --name test --rm -it busybox:1.28 /bin/sh
Executing the above command gets stuck
3.Can nerdctl run be executed outside the k8s environment
Describe the results you received and expected
null
What version of nerdctl are you using?
[root@m1 ~]# nerdctl version
Client:
Version: v2.0.2
OS/Arch: linux/amd64
Git commit: 1220ce7
buildctl:
Version: v0.17.1
GitCommit: 8b1b83ef4947c03062cdcdb40c69989d8fe3fd04
Server:
containerd:
Version: v2.0.1
GitCommit: 88aa2f531d6c2922003cc7929e51daf1c14caa0a
runc:
Version: 1.2.2
GitCommit: v1.2.2-0-g7cb36325
Are you using a variant of nerdctl? (e.g., Rancher Desktop)
None
Host information
[root@m1 ~]# nerdctl info
Client:
Namespace: k8s.io
Debug Mode: false
Server:
Server Version: v2.0.1
Storage Driver: overlayfs
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Log: fluentd journald json-file none syslog
Storage: native overlayfs
Security Options:
seccomp
Profile: builtin
cgroupns
Kernel Version: 5.14.0-427.13.1.el9_4.x86_64
Operating System: Rocky Linux 9.4 (Blue Onyx)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 3.793GiB
Name: m1
ID: b26f2865-ca8a-49fa-a3a2-ec66adae9813
The text was updated successfully, but these errors were encountered: