-
Notifications
You must be signed in to change notification settings - Fork 634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax error on network annotation read failure #3771
Conversation
Windows failure likely unrelated. |
Hey @apostasie, thank you again for the really quick patch. You definitely found the right place - so I can confirm, that the patch is working and the signal is send to the container.
I get the point of the warning message, but from what I understand, I don't fully agree to it. In the README of this repository, debugging Kubernetes is a described use case of nerdctl. So I wouldn't expect a warning, if I actually do this. Sending a kill signal doesn't really change anything to the container deployment itself. Especially as this was working fine with releases <2.0.0. |
TBH, I do not quite care what we say here :-). I'll leave it to maintainers to decide what is the appropriate wording - if any. Cheers. @AkihiroSuda @djdongjin @fahedouch at your convenience. |
@jandubois moving the nerdctl part of the conversation here.
We can / should definitely come up with a better, less alarming message. What about: |
I am not thrilled about yet another flag.
Interesting question (scope would be much larger than this specific issue).
Definitely the same "issue" - for another command (rm). I'll look into it and augment this PR. |
@jandubois latest push does:
I just tested it against a container from which I removed the I am not 100% sure what would happen in the case of a container fully created outside of nerdctl, so, let me know if you still face (other) issues on remove. |
@AkihiroSuda could you poke the CI? |
@apostasie Thank you for continuing to work on this. I would still argue that any message that includes the string $ nerdctl -n k8s.io stop ca97b953b4ea
INFO[0000] unable to retrieve networking information for that container, which is likely managed by another tool error="unexpected end of JSON input"
ca97b953b4ea I think the "unexpected end of JSON input" is an internal error that has been handled by generating a warning, so should not be included in the warning message at all. As for the GUI displaying an error dialog, I guess it should check for the Deleting the container works, but is still too noisy: $ nerdctl -n k8s.io rm ca97b953b4ea
INFO[0000] unable to retrieve networking information for that container, which is likely managed by another tool: "ca97b953b4ea607ced637d55bc590fbc7daebc8e17dffd6e03313f57b2c9ba43" error="unexpected end of JSON input"
WARN[0000] failed to remove hosts file for container "ca97b953b4ea607ced637d55bc590fbc7daebc8e17dffd6e03313f57b2c9ba43" error="hosts-store error\nnot found\nstat /var/lib/nerdctl/dbb19c5e/etchosts/k8s.io/ca97b953b4ea607ced637d55bc590fbc7daebc8e17dffd6e03313f57b2c9ba43: no such file or directory"
ca97b953b4ea The Anyways, the exit code is still 0, so the GUI can avoid showing another ugly popup, and I guess for the CLI this is ok. I do not want to see the |
Just a thought, and I'm not even sure if I support it myself, but you could add a special rule for the It feels a little bit too magical to have a special case hard-coded in |
Just to clarify: I was not actually asking that Also: all of this is a regression from |
By default everything should be just shown, but there can be a flag to change the behavior Similar discussion: |
I disagree.
Again not familiar with Rancher, but yeah.
We could overall do a better job in many places at providing better, more useful, accurate feedback messages to the user. In that specific case though, I do disagree. This is a command-line tool, for nerds (it is in the name). Getting accurate debugging information of what the error condition is (in that case, a golang library error from the json decoder) is crucial, and I think censoring it because "it could scare some users" would be the wrong decision. |
That was never the intention.
Agreed, and this is what we should fix here.
That is certainly not true: And I don't think this is a reasonable expectation. |
That's fine. To be honest, I don't care too much, as I think it is very much an edge case, and you should not touch containers managed by So getting I'll leave it to the |
I'm sorry, I'm not making this up: $ nerdctl --version
nerdctl version 1.7.7
$ kubectl create deployment nginx-test --image=nginx:stable
deployment.apps/nginx-test created
$ kubectl create service clusterip nginx-test --tcp=5000:80
service/nginx-test created
$ nerdctl -n k8s.io ps -a | grep nginx
3ce45b96147a docker.io/library/nginx:stable "/docker-entrypoint.…" 3 seconds ago Up k8s://default/nginx-test-5c6c4f85c7-6srbc/nginx
867c6b30d5fa docker.io/rancher/mirrored-pause:3.6 "/pause" 24 seconds ago Up k8s://default/nginx-test-5c6c4f85c7-6srbc
$ nerdctl -n k8s.io stop 3ce45b96147a
3ce45b96147a
$ nerdctl -n k8s.io rm 3ce45b96147a
3ce45b96147a So this may not be a reasonable expectation, but it is how it used to work. |
Definitely not on windows |
Either way - like you - I don't care much about that part either tbh (with or without warning). |
That may be the case; I assume this is about managing Windows containers then, which seems even more of an edge case than dealing with Kubernetes containers. I've been exclusively talking about managing Linux containers. Anyways, I just responded because you accused me of lying about this ("That is certainly not true"), and I did not. It worked correctly and cleanly before, and now it doesn't. Maybe it should have always displayed a warning... 🤷 But as I wrote before, while I disagree that the content of the error messages is appropriate, I'll leave this to the maintainers. |
Pointing out a mistake - not accusing anyone of anything. Peace. |
Ok, I promised to shut up now, but I still wanted to explain why I think including It is an internal detail that only a The important part of the message is to let the user know they are manipulating a container that By including irrelevant information in a warning or error you just risk confusing the user about what is wrong, and what they can do to fix (how is the user supposed to fix the JSON input?). Information that is only useful to the developer should be at the It is not about "scaring" users, but "confusing" them. Finally, you simply can't say it is just a warning and include Sorry for the rambling; just wanted to explain my point of view; not going to argue any further. |
Hey @jandubois Thank you. Truly appreciate a different opinion here. This is what I like about open-source. However - to explain a bit better why I am generally not inclined to prioritize UX over debugability in the current state of affairs: nerdctl had/has serious issues with debugability overall that has significantly hurt quality in the past. A lot - most? - of the reports we get from users (or from the CI) are absolutely cryptic, and require black-belt-tea-leaves-reading abilities to even start making sense of (#2676 is one of many examples)
Yes, in a perfect world, with a more mature / robust software, absolutely. Right now, yes, I would actually be keen to include stack traces on stderr :p. /jk Enough general philosophy though, and back to the concrete case here:
Thanks again @jandubois |
@AkihiroSuda this PR addresses a 2.0 regression when trying to stop / kill containers started with other tools (specifically k8s). There is some debate on whether we should print a warning or not in that circumstance (I don't really care either way), but otherwise I believe we should consider merging this soon. lmk your thoughts. |
pkg/cmd/container/kill.go
Outdated
@@ -143,7 +143,8 @@ func cleanupNetwork(ctx context.Context, container containerd.Container, globalO | |||
networksJSON := spec.Annotations[labels.Networks] | |||
var networks []string | |||
if err := json.Unmarshal([]byte(networksJSON), &networks); err != nil { | |||
return err | |||
log.G(ctx).WithError(err).Infof("unable to retrieve networking information for that container, which is likely managed by another tool") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What tool uses the nerdctl/networks
label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None that I know of. So, this is saying that the container does not have the label, and is managed by another cli.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning should be printed on networksJSON == ""
, not on json.Unmarshal
error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other place (in remove) does check on NetworkOptionsFromSpec
, which returns an err onjson.Unmarshall
- so, if we change it here, we should also change the logic inside NetworkOptionsFromSpec
.
I would rather keep this PR minimal in terms of logic changes.
The current logic captures both cases when networksJSON == ""
and when networksJSON
is non-empty but does not parse.
lmk your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the label is non-empty, json.Unmarshal
error is critical and not negligible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - but then, these operations here are remove
and kill
(not create
, run
, etc).
Should we really prevent remove / kill from succeeding if the network label is mangled?
Signed-off-by: apostasie <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
CI failure is the usual Docker Hub 429 (too many requests). |
Fix #3765
Killing or stopping a container that does not have a nerdctl network annotation is currently a hard fail.
Suggesting we change that to a soft error, warning the user that they are (very likely) doing something that will lead to trouble (eg: managing containers with different tools).
@mathias-ioki I you can test this patch and confirm it fixes your issue, that would be really helpful (I am reading tea-leaves here).