Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not assume systemd-resolved for resolv.conf #11813

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

VannTen
Copy link
Contributor

@VannTen VannTen commented Dec 18, 2024

What type of PR is this?
/kind bug

What this PR does / why we need it:
We currently assume on some distribution that systemd-resolved is used
and therefore we can use /run/systemd/resolve/resolv.conf to pass to the
kubelet configuration.

This breaks if the distribution is configured differently (use another
DNS service) and force us to special case.

Instead, detect if systemd-resolved is running dynamically and set
kube_resolv_conf default accordingly.

Which issue(s) this PR fixes:
Fixes #11810

Special notes for your reviewer:
I'm just wondering if this could cause breakage when systemd-resolved is running but /etc/resolv.conf does not point to its managed files and has other settings 🤔
I'm not sure what we should do in that case (but I don't think hardcoding is the answer).

Does this PR introduce a user-facing change?:

Use /run/systemd/resolve/resolv.conf for kubelet configuration only when systemd-resolved is active.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Dec 18, 2024
@VannTen
Copy link
Contributor Author

VannTen commented Dec 18, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. labels Dec 18, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: VannTen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 18, 2024
@VannTen VannTen force-pushed the fix/dont_assume_systemd_resolved branch from aee5e23 to e4d3a6f Compare December 18, 2024 14:57
@VannTen VannTen mentioned this pull request Dec 19, 2024
@yankay
Copy link
Member

yankay commented Dec 24, 2024

Thanks @VannTen
Look Good to me.
If there are more eyes to help with the review the PR, it would be better

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 28, 2024
@tico88612
Copy link
Member

tico88612 commented Dec 28, 2024

I'm just wondering if this could cause breakage when systemd-resolved is running but /etc/resolv.conf does not point to its managed files and has other settings

If systemd-resolved is not enabled, use /etc/resolv.conf (which is fine in this case)
If systemd-resolved is enabled, check if /etc/resolv.conf is a file or a soft link.

If it is a soft link, look at /etc/resolv.conf to see which file it points to. (EDIT: I'm not sure why kube_resolv_conf needs to be set to /run/systemd/resolve/resolv.conf instead of /run/systemd/resolve/stub-resolv.conf?)

I think that should solve your situation.
WDYT?

@VannTen
Copy link
Contributor Author

VannTen commented Dec 28, 2024

stub-resolv.conf use systemd-resolved stub resolver in nameserver (which listen on 127.0.0.53), while resolv.conf point use the upstream dns servers specified the systemd-resolved configuration

# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 192.168.1.1
search .
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad
search .

Pods don't have access to localhost network, so stub-resolv.conf wouldn't work.

@@ -6,7 +6,7 @@ kubelet_address: "{{ ip | default(fallback_ip) }}{{ (',' + ip6) if enable_dual_s
kubelet_bind_address: "{{ ip | default('0.0.0.0') }}"

# resolv.conf to base dns config
kube_resolv_conf: "/etc/resolv.conf"
kube_resolv_conf: "{{ '/run/systemd/resolve/resolv.conf' if 'systemd-resolved' in active_dns_services else '/etc/resolv.conf' }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that some resolv.conf modes (static if I remember correctly) check /etc/resolv.conf for files.
Using the systemd-resolved enablement as a check doesn't seem to be accurate.

kube_resolv_conf is by default still /etc/resolv.conf.
If systemd-resolved is enabled and /etc/resolv.conf is a soft link, change kube_resolv_conf to /run/systemd/resolve/resolv.conf.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/etc/resolv.conf is a soft link for /run/systemd/resolve/stub-resolv.conf not /run/systemd/resolve/resolv.conf .

And from the advice of https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#known-issues ,
So changing the config to run/systemd/resolve/resolv.conf is OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what I'm trying to say is that systemd-resolved enablement is not the only way to tell, if systemd-resolved is enabled but /etc/resolv.conf is a file (not a soft link), it should probably be /etc/resolv.conf instead of /run/systemd/resolve/resolv.conf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if systemd-resolved is enabled/running but /etc/resolv.conf is not a symlink to either /run/systemd/resolve/stub-resolv.conf / /run/systemd/resolve/resolv.conf / /usr/lib/systemd/resolv.conf, systemd-resolved will use it as source of DNS configuration. See this excerpt from man systemd-resolved (8):

/ETC/RESOLV.CONF
       Four modes of handling /etc/resolv.conf (see resolv.conf(5)) are supported:

       •   systemd-resolved maintains the /run/systemd/resolve/stub-resolv.conf file for compatibility with traditional Linux programs. This file lists the
           127.0.0.53 DNS stub (see above) as the only DNS server. It also contains a list of search domains that are in use by systemd-resolved. The list of
           search domains is always kept up-to-date. Note that /run/systemd/resolve/stub-resolv.conf should not be used directly by applications, but only
           through a symlink from /etc/resolv.conf. This file may be symlinked from /etc/resolv.conf in order to connect all local clients that bypass local
           DNS APIs to systemd-resolved with correct search domains settings. This mode of operation is recommended.

       •   A static file /usr/lib/systemd/resolv.conf is provided that lists the 127.0.0.53 DNS stub (see above) as only DNS server. This file may be symlinked
           from /etc/resolv.conf in order to connect all local clients that bypass local DNS APIs to systemd-resolved. This file does not contain any search
           domains.

       •   systemd-resolved maintains the /run/systemd/resolve/resolv.conf file for compatibility with traditional Linux programs. This file may be symlinked
           from /etc/resolv.conf and is always kept up-to-date, containing information about all known DNS servers. Note the file format's limitations: it does
           not know a concept of per-interface DNS servers and hence only contains system-wide DNS server definitions. Note that
           /run/systemd/resolve/resolv.conf should not be used directly by applications, but only through a symlink from /etc/resolv.conf. If this mode of
           operation is used local clients that bypass any local DNS API will also bypass systemd-resolved and will talk directly to the known DNS servers.

       •   Alternatively, /etc/resolv.conf may be managed by other packages, in which case systemd-resolved will read it for DNS configuration data. In this
           mode of operation systemd-resolved is consumer rather than provider of this configuration file.

       Note that the selected mode of operation for this file is detected fully automatically, depending on whether /etc/resolv.conf is a symlink to
       /run/systemd/resolve/resolv.conf or lists 127.0.0.53 as DNS server.

So I think relying on systemd-resolved alone might actually work 🤔

I was initially going with checking for a symlink, but I'm not sure we should do that, because we end up with several boolean instead of ones, which results in ambiguity in certain cases:

  • /etc/resolv.conf is a symlink to one of the managed files, but systemd-resolved is not started/enabled. Or the reverse.
  • for that matter, systemd-resolved is started, but not enabled, or the reverse.

Wdyt ?

We currently assume on some distribution that systemd-resolved is used
and therefore we can use /run/systemd/resolve/resolv.conf to pass to the
kubelet configuration.

This breaks if the distribution is configured differently (use another
DNS service) and force us to special case.

Instead, detect if systemd-resolved is running dynamically and set
kube_resolv_conf default accordingly.
@VannTen VannTen force-pushed the fix/dont_assume_systemd_resolved branch from e4d3a6f to cfabb32 Compare January 6, 2025 08:59
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 6, 2025
@pacoxu
Copy link
Member

pacoxu commented Jan 22, 2025

We encountered this issue recently on Ubuntu. But initially, the cluster ran smoothly and coredns can work well to forward to local nameservers in /run/systemd/resolve/resolv.conf.

  • IIUC, at first, when coredns pod started, the /run/systemd/resolve/resolv.conf had not yet been modified.
  • Later, systemd-resolved service updated the /run/systemd/resolve/resolv.conf to point to the cluster DNS (coredns cluster IP)
  • After that, coredns pod restarted for some reasons(like oom kill or drain) and the new CoreDNS pod(with the default DNSPolicy) is using the new /run/systemd/resolve/resolv.conf which is coredns cluster ip. Then the new pod can not handle forwarding properly. It will forward to itself and ultimately responding with REFUSED. The log shows AAAA: concurrent queries exceeded maximum 1000.
  • To address this, we increased max_concurrent from default 1000 to 10000, and the error log still indicated AAAA: concurrent queries exceeded maximum 10000. Finally, we found the /run/systemd/resolve/resolv.conf is not as expected.

This will only happen if there is no external DNS in /etc/systemd/resolved.conf and there are nameservers in /etc/resolv.conf. Service systemd-resolved.service will edit /run/systemd/resolve/resolv.conf according to /etc/systemd/resolved.conf and coredns cluster ip.

[ERROR] plugin/errors: 5 console.d.run. AAAA: concurrent queries exceeded maximum 1000
[ERROR] plugin/errors: 5 console.d.run. A: concurrent queries exceeded maximum 1000

This issue is quite subtle and is likely to occur when CoreDNS restarts, posing a significant threat to many users' production environments.

@yankay
Copy link
Member

yankay commented Jan 22, 2025

HI @cyclinder
would you please help to review it ?

@0ekk
Copy link
Member

0ekk commented Jan 22, 2025

About with #11813 (comment). It looks like a lack of upstream dns, or changed the /etc/resolv.conf directly when systemd-resolved is running.

FYI, when /etc/resolv.conf is managed by systemd-resolved, any changes to /etc/resolv.conf will not take effect continuously.

So maybe we could add a checking to check if at least one configured nameserver is fixed when these conditions are met:

  • the Kubespray option kube_resolv_conf point to /run/systemd/resolve/resolv.conf
  • the systemd-resolved service is present and running

or

  • the Kubespray option kube_resolv_conf set to /etc/resolv.conf
  • the systemd-resolved service is present and running
  • /etc/resolv.conf is a symlink to /run/systemd/resolv.conf or /run/systemd/stub-resolv.conf or /usr/lib/systemd/resolv.conf (i.e. /etc/resolv.conf is being managed by systemd-resolved).

The configured nameserver may be set by resolved.conf(5) or network interface configuration (netplan in Ubuntu), we could easily use resolvectl dns to get per-interface DNS configuration.

We ever accepted a pr #9502 which make the similar thing as I mentioned above, but that's a small range. Taking systemd-resolved into account will make Kubespray more robust.

@ant31
Copy link
Contributor

ant31 commented Jan 24, 2025

In my clusters I'm creating a specific pod-resolv.conf that is independent from system.

$ cat /etc/kubernetes/kubelet-config.yaml | grep resolvConf
resolvConf: "/etc/kubernetes/pod-resolv.conf"

And pod-resolv.conf is generated by the kubespray(fork)

# Ansible entries BEGIN
search default.svc.cluster.local svc.cluster.local
nameserver 10.10.0.3
options ndots:2 timeout:2 attempts:2 edns
# Ansible entries END

Pod are then always reaching out to coredns, then it's up to coredns to do whatever.
What's is configured on the OS has less impact (unless coredns uses it, but that would be a choice)

@VannTen
Copy link
Contributor Author

VannTen commented Jan 24, 2025 via email

@ant31
Copy link
Contributor

ant31 commented Jan 24, 2025

On our clusters, we're currently evaluating having not passing at all the host resolv.conf to the kubelet,

Yes, this what I'm doing too.
Configuring coredns(nodelocaldns) + genarating a resolvconf for the kubelet, the host used is not including at all in the cluster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can not create container when systemd-resolved not running
7 participants