-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spin Pod stuck at the Terminating state #207
Comments
It might related to containerd/runwasi#418
The best way I am aware of is to examine the containerd logs for the spin shim. If you are using the spin shim in k3s, you can refer to this guide on how to find the containerd logs |
Thank you for the pointers! I'm starting the pod as described at https://github.com/rgl/k3s-vagrant/blob/master/example-spin.yml Although I can submit kubectl delete for the pod, the following command never seems to return: $ kubectl delete pod/example-spin-654dccd659-8dskm
pod "example-spin-654dccd659-8dskm" deleted Here's what the # tail -f /var/lib/rancher/k3s/agent/containerd/containerd.log
time="2024-02-06T09:22:42.068337280Z" level=info msg="StopContainer for \"0c2c8eabdfa48729d488297075ac089ba3e5d88580801cf6a0431316351cb919\" with timeout 30 (s)"
time="2024-02-06T09:22:42.411348488Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:example-spin-654dccd659-8tltx,Uid:d479a9b2-e1b6-4509-b35d-8f89aea4f16e,Namespace:default,Attempt:0,}"
map[string]interface {}{"cniVersion":"1.0.0", "forceAddress":true, "hairpinMode":true, "ipMasq":false, "ipam":map[string]interface {}{"ranges":[][]map[string]interface {}{[]map[string]interface {}{map[string]interface {}{"subnet":"10.12.1.0/24"}}}, "routes":[]types.Route{types.Route{Dst:net.IPNet{IP:net.IP{0xa, 0xc, 0x0, 0x0}, Mask:net.IPMask{0xff, 0xff, 0x0, 0x0}}, GW:net.IP(nil)}}, "type":"host-local"}, "isDefaultGateway":true, "isGateway":true, "mtu":(*uint)(0xc0000a8738), "name":"cbr0", "type":"bridge"}
delegateAdd: netconf sent to delegate plugin:
{"cniVersion":"1.0.0","forceAddress":true,"hairpinMode":true,"ipMasq":false,"ipam":{"ranges":[[{"subnet":"10.12.1.0/24"}]],"routes":[{"dst":"10.12.0.0/16"}],"type":"host-local"},"isDefaultGateway":true,"isGateway":true,"mtu":1500,"name":"cbr0","type":"bridge"}time="2024-02-06T09:22:42.488650624Z" level=info msg="server listen started"
time="2024-02-06T09:22:42.488707028Z" level=info msg="server started"
time="2024-02-06T09:22:42.4887162Z" level=info msg="Shim successfully started, waiting for exit signal..."
time="2024-02-06T09:22:42.494708048Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:example-spin-654dccd659-8tltx,Uid:d479a9b2-e1b6-4509-b35d-8f89aea4f16e,Namespace:default,Attempt:0,} returns sandbox id \"54a9538246e79366cc0e00f2986df53cd91adc6b225a517473ae13bd25e6ece2\""
time="2024-02-06T09:22:42.496768712Z" level=info msg="CreateContainer within sandbox \"54a9538246e79366cc0e00f2986df53cd91adc6b225a517473ae13bd25e6ece2\" for container &ContainerMetadata{Name:example,Attempt:0,}"
time="2024-02-06T09:22:42.507516899Z" level=info msg="CreateContainer within sandbox \"54a9538246e79366cc0e00f2986df53cd91adc6b225a517473ae13bd25e6ece2\" for &ContainerMetadata{Name:example,Attempt:0,} returns container id \"dfd08c5d510679cffdb312f1455d1c9f140675b5cb170e9d7d5f840b4f30f141\""
time="2024-02-06T09:22:42.508273480Z" level=info msg="StartContainer for \"dfd08c5d510679cffdb312f1455d1c9f140675b5cb170e9d7d5f840b4f30f141\""
time="2024-02-06T09:22:42.517526823Z" level=info msg="manifest is not in WASM OCI image format"
time="2024-02-06T09:22:42.518658869Z" level=info msg="cgroup manager V2 will be used"
time="2024-02-06T09:22:42.518766618Z" level=warn msg="Controller rdma is not yet implemented."
time="2024-02-06T09:22:42.518784762Z" level=warn msg="Controller misc is not yet implemented."
time="2024-02-06T09:22:42.539281631Z" level=warn msg="Controller rdma is not yet implemented."
time="2024-02-06T09:22:42.539340343Z" level=warn msg="Controller misc is not yet implemented."
time="2024-02-06T09:22:42.545786429Z" level=info msg="close_range; preserve_fds=0"
time="2024-02-06T09:22:42.972849735Z" level=warn msg="intermediate process already reaped"
time="2024-02-06T09:22:42.976602376Z" level=info msg="starting instance: dfd08c5d510679cffdb312f1455d1c9f140675b5cb170e9d7d5f840b4f30f141"
time="2024-02-06T09:22:42.977461079Z" level=info msg="calling start function"
time="2024-02-06T09:22:42.977593832Z" level=info msg="setting up wasi"
time="2024-02-06T09:22:42.983485892Z" level=info msg="StartContainer for \"dfd08c5d510679cffdb312f1455d1c9f140675b5cb170e9d7d5f840b4f30f141\" returns successfully"
time="2024-02-06T09:22:44.069400677Z" level=error msg="get state for 0c2c8eabdfa48729d488297075ac089ba3e5d88580801cf6a0431316351cb919" error="context deadline exceeded: unknown"
time="2024-02-06T09:22:44.069484079Z" level=warning msg="unknown status" status=0
time="2024-02-06T09:22:44.069543062Z" level=info msg="Stop container \"0c2c8eabdfa48729d488297075ac089ba3e5d88580801cf6a0431316351cb919\" with signal terminated"
time="2024-02-06T09:22:46.200462576Z" level=info msg="stats: StatsRequest { id: "dfd08c5d510679cffdb312f1455d1c9f140675b5cb170e9d7d5f840b4f30f141", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }"
time="2024-02-06T09:22:58.77219967Z" level=info msg="instantiate_pre;"
time="2024-02-06T09:22:58.773132879Z" level=info msg=" >>> running spin trigger"
time="2024-02-06T09:22:58.773738621Z" level=info msg=" >>> notifying main thread we are about to start"
time="2024-02-06T09:22:58.774493624Z" level=info msg="Serving http://0.0.0.0:80"
time="2024-02-06T09:23:40.806373573Z" level=error msg="StopContainer for \"ffd2781bd75f19e5157ac4762314693a4f95dbe6cc43128c28b50f117658fb5f\" failed" error="failed to kill container \"ffd2781bd75f19e5157ac4762314693a4f95dbe6cc43128c28b50f117658fb5f\": context canceled: unknown"
time="2024-02-06T09:23:40.808160649Z" level=info msg="StopPodSandbox for \"e23b2fe57bcd33134ac414a091e42b25c1aeefd0887a3753720ba640939f9fd9\""
time="2024-02-06T09:23:42.809070520Z" level=error msg="get state for ffd2781bd75f19e5157ac4762314693a4f95dbe6cc43128c28b50f117658fb5f" error="context deadline exceeded: unknown"
time="2024-02-06T09:23:42.809150816Z" level=warning msg="unknown status" status=0
time="2024-02-06T09:23:42.809185282Z" level=info msg="Kill container \"ffd2781bd75f19e5157ac4762314693a4f95dbe6cc43128c28b50f117658fb5f\""
time="2024-02-06T09:25:12.068440580Z" level=error msg="StopContainer for \"0c2c8eabdfa48729d488297075ac089ba3e5d88580801cf6a0431316351cb919\" failed" error="failed to stop container \"0c2c8eabdfa48729d488297075ac089ba3e5d88580801cf6a0431316351cb919\": context deadline exceeded: unknown"
time="2024-02-06T09:25:12.069488406Z" level=info msg="StopPodSandbox for \"cc4cd0cb2be413f03b06ae3d52c18dbcfd4992672f6a10af8edf4ca91ff25474\""
time="2024-02-06T09:25:14.070067749Z" level=error msg="get state for 0c2c8eabdfa48729d488297075ac089ba3e5d88580801cf6a0431316351cb919" error="context deadline exceeded: unknown"
time="2024-02-06T09:25:14.070151733Z" level=warning msg="unknown status" status=0
time="2024-02-06T09:25:14.070191783Z" level=info msg="Kill container \"0c2c8eabdfa48729d488297075ac089ba3e5d88580801cf6a0431316351cb919\""
time="2024-02-06T09:25:40.809165923Z" level=error msg="StopPodSandbox for \"e23b2fe57bcd33134ac414a091e42b25c1aeefd0887a3753720ba640939f9fd9\" failed" error="failed to stop container \"ffd2781bd75f19e5157ac4762314693a4f95dbe6cc43128c28b50f117658fb5f\": failed to kill container \"ffd2781bd75f19e5157ac4762314693a4f95dbe6cc43128c28b50f117658fb5f\": context deadline exceeded: unknown"
time="2024-02-06T09:25:40.809863923Z" level=info msg="StopPodSandbox for \"91ce1fc7c9ae5f571bab103e695afea48228ed99ee6188c767de368cdd8bca31\""
time="2024-02-06T09:25:40.810007290Z" level=info msg="Container to stop \"cfeb2a380219c92f3d0fab2d57cad5542812736e185433888a2e5f1ad4adba2b\" must be in running or unknown state, current state \"CONTAINER_EXITED\""
time="2024-02-06T09:25:40.840434562Z" level=info msg="TearDown network for sandbox \"91ce1fc7c9ae5f571bab103e695afea48228ed99ee6188c767de368cdd8bca31\" successfully"
time="2024-02-06T09:25:40.840480615Z" level=info msg="StopPodSandbox for \"91ce1fc7c9ae5f571bab103e695afea48228ed99ee6188c767de368cdd8bca31\" returns successfully"
time="2024-02-06T09:25:41.758165380Z" level=info msg="StopContainer for \"ffd2781bd75f19e5157ac4762314693a4f95dbe6cc43128c28b50f117658fb5f\" with timeout 30 (s)"
time="2024-02-06T09:25:43.759072133Z" level=error msg="get state for ffd2781bd75f19e5157ac4762314693a4f95dbe6cc43128c28b50f117658fb5f" error="context deadline exceeded: unknown"
time="2024-02-06T09:25:43.759133691Z" level=warning msg="unknown status" status=0
time="2024-02-06T09:25:43.759198951Z" level=info msg="Skipping the sending of signal terminated to container \"ffd2781bd75f19e5157ac4762314693a4f95dbe6cc43128c28b50f117658fb5f\" because a prior stop with timeout>0 request already sent the signal"
time="2024-02-06T09:26:13.759386968Z" level=info msg="Kill container \"ffd2781bd75f19e5157ac4762314693a4f95dbe6cc43128c28b50f117658fb5f\"" After that the logs all look the same, it looping trying/failing to kill the container. |
This log looks pretty interesting. What version of |
I'm using the version bundled with k3s from https://github.com/k3s-io/k3s/releases/tag/v1.28.5+k3s1. At that page, there's a Embedded Component Versions section, which points to Containerd v1.7.11-k3s2. |
Hello, |
I think containerd-shim-spin was moved out of this repo into https://github.com/spinkube/containerd-shim-spin. I also think this issue was fixed in spinkube/containerd-shim-spin#23 but I didn't yet have a chance to try it in k3s. I did have a chance to try https://github.com/spinkube/containerd-shim-spin/releases/tag/v0.14.1 in talos at rgl/terraform-libvirt-talos@1c32bb7 and there, the issue reported here no longer happens! :-) |
Ok nice, thank you |
I've just upgraded to https://github.com/spinkube/containerd-shim-spin/releases/tag/v0.14.1 in k3s at rgl/k3s-vagrant@5b85081 and the issue reported here no longer happens! :-) |
I've created my own independent example tinygo spin application at https://github.com/rgl/spin-http-go-example and updated my k3s playground at https://github.com/rgl/k3s-vagrant to deploy the spin-http-go-example example.
Creating and using the Pod works fine, except deleting the Pod. Although the delete request is accepted by kubernetes, the Pod is stuck at the Terminating state.
It seems something is broken, but I'm not sure what.
I've added support for containerd-shim-spin-v2 0.10.0 (spin 2.0.1 webassembly runtime) in rgl/k3s-vagrant@e5df418
Can you please shed some light on how to troubleshoot this?
kubectl describe
on the pod, does not seem to show anything useful.The text was updated successfully, but these errors were encountered: