Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: When etcdserver=true, newly added ApisixRoutes don't take effect after restarting the ingress-controller #2341

Open
lkad opened this issue Dec 26, 2024 · 5 comments

Comments

@lkad
Copy link

lkad commented Dec 26, 2024

Current Behavior

like #2167 ,i install apsix ingress-controller use etcdserver=true https://apisix.apache.org/blog/2023/10/18/ingress-apisix/#design-of-new-architecture ,but somtime ingress-controller can restart ,then i add route or change rout in apisixroute ,it not effect .
install apisix :
ADMIN_API_VERSION=v3
helm install apisix .
--set service.type=NodePort
--set ingress-controller.enabled=true
--create-namespace
--namespace ingress-apisix
--set ingress-controller.config.apisix.serviceNamespace=ingress-apisix
--set ingress-controller.config.apisix.adminAPIVersion=$ADMIN_API_VERSION
--set ingress-controller.config.kubernetes.enableGatewayAPI=true
--set dashboard.enabled=true
--set ingress-controller.config.etcdserver.enabled=true
then add apisixroute

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  name: test-route
  namespace: testroute2
spec:
  http:
    - name: route-1
      match:
        hosts:
          - routetest2cccc.ccc.ccc
        paths:
          - /*
      backends:
        - serviceName: httpbin
          servicePort: 80
kind: Deployment
apiVersion: apps/v1
metadata:
  name: httpbin
  namespace: testroute2
  labels:
    app: httpbin
  annotations:
    deployment.kubernetes.io/revision: '1'
    kubesphere.io/creator: admin
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpbin
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: httpbin
      annotations:
        kubesphere.io/creator: admin
        kubesphere.io/imagepullsecrets: '{}'
        kubesphere.io/restartedAt: '2024-11-28T00:57:38.581Z'
    spec:
      containers:
        - name: container-cjo9a0
          image: mccutchen/go-httpbin
          ports:
            - name: http-0
              containerPort: 80
              protocol: TCP
          env:
            - name: PORT
              value: '80'
          resources:
            limits:
              cpu: '2'
              memory: 1000Mi
            requests:
              cpu: 200m
              memory: 200Mi
          readinessProbe:
            httpGet:
              path: /
              port: 80
              scheme: HTTP
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
kind: Service
apiVersion: v1
metadata:
  name: httpbin
  namespace: testroute2
  labels:
    app: httpbin
  annotations:
    kubesphere.io/creator: admin
spec:
  ports:
    - name: http-1
      protocol: TCP
      port: 80
      targetPort: 80
  selector:
    app: httpbin
  type: ClusterIP
  sessionAffinity: None
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack
  internalTrafficPolicy: Cluster

please add more other apisixroute
then curl ip/get -H "host: routetest2cccc.ccc.ccc" .return 200.
then find the ingress on which k8s node ,on the k8s node exec shell kill the ingress-controller
" ps -elf | grep ingress-controller | awk '{print $4}' | xargs kill "
then change the apisixroute

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  name: test-route
  namespace: testroute2
spec:
  http:
    - name: route-1
      match:
        hosts:
          - routenew.ccc.ccc
        paths:
          - /*
      backends:
        - serviceName: httpbin
          servicePort: 80

curl ip/get -H "host: routenew.ccc.ccc" .return 404.

Expected Behavior

curl ip/get -H "host: routenew.ccc.ccc" .return 200.

Error Logs

apisix contoianer log
2024/12/26 06:07:13 [error] 59#59: *3033128 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:13 [error] 51#51: *3033129 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:13 [error] 56#56: *3033130 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:13 [error] 54#54: *3033131 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:16 [error] 49#49: *3033267 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:16 [error] 52#52: *3033268 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:16 [error] 55#55: *3033269 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:16 [error] 53#53: *3033270 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:16 [error] 59#59: *3033271 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:16 [error] 50#50: *3033272 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:16 [error] 56#56: *3033273 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:16 [error] 51#51: *3033274 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:16 [error] 54#54: *3033275 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:19 [error] 49#49: *3033413 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:19 [error] 52#52: *3033414 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:19 [error] 53#53: *3033415 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:19 [error] 55#55: *3033416 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:19 [error] 59#59: *3033417 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:19 [error] 50#50: *3033418 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:19 [error] 56#56: *3033419 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:19 [error] 51#51: *3033420 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

2024/12/26 06:07:19 [error] 54#54: *3033421 [lua] config_etcd.lua:193: watchdir err: has no healthy etcd endpoint available, context: ngx.timer

Steps to Reproduce

like #2167 ,i install apsix ingress-controller use etcdserver=true,but somtime ingress-controller can restart ,then i add route or change rout in apisixroute ,it not effect .
install apisix :
ADMIN_API_VERSION=v3
helm install apisix .
--set service.type=NodePort
--set ingress-controller.enabled=true
--create-namespace
--namespace ingress-apisix
--set ingress-controller.config.apisix.serviceNamespace=ingress-apisix
--set ingress-controller.config.apisix.adminAPIVersion=$ADMIN_API_VERSION
--set ingress-controller.config.kubernetes.enableGatewayAPI=true
--set dashboard.enabled=true
--set ingress-controller.config.etcdserver.enabled=true
then add apisixroute

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  name: test-route
  namespace: testroute2
spec:
  http:
    - name: route-1
      match:
        hosts:
          - routetest2cccc.ccc.ccc
        paths:
          - /*
      backends:
        - serviceName: httpbin
          servicePort: 80
kind: Deployment
apiVersion: apps/v1
metadata:
  name: httpbin
  namespace: testroute2
  labels:
    app: httpbin
  annotations:
    deployment.kubernetes.io/revision: '1'
    kubesphere.io/creator: admin
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpbin
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: httpbin
      annotations:
        kubesphere.io/creator: admin
        kubesphere.io/imagepullsecrets: '{}'
        kubesphere.io/restartedAt: '2024-11-28T00:57:38.581Z'
    spec:
      containers:
        - name: container-cjo9a0
          image: mccutchen/go-httpbin
          ports:
            - name: http-0
              containerPort: 80
              protocol: TCP
          env:
            - name: PORT
              value: '80'
          resources:
            limits:
              cpu: '2'
              memory: 1000Mi
            requests:
              cpu: 200m
              memory: 200Mi
          readinessProbe:
            httpGet:
              path: /
              port: 80
              scheme: HTTP
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          imagePullPolicy: IfNotPresent
kind: Service
apiVersion: v1
metadata:
  name: httpbin
  namespace: testroute2
  labels:
    app: httpbin
  annotations:
    kubesphere.io/creator: admin
spec:
  ports:
    - name: http-1
      protocol: TCP
      port: 80
      targetPort: 80
  selector:
    app: httpbin
  type: ClusterIP
  sessionAffinity: None
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack
  internalTrafficPolicy: Cluster

please add more other apisixroute
then curl ip/get -H "host: routetest2cccc.ccc.ccc" .return 200.
then find the ingress on which node ,one the node exec shell kill the ingress-controller
" ps -elf | grep ingress-controller | awk '{print $4}' | xargs kill "
then change the apisixroute

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  name: test-route
  namespace: testroute2
spec:
  http:
    - name: route-1
      match:
        hosts:
          - routenew.ccc.ccc
        paths:
          - /*
      backends:
        - serviceName: httpbin
          servicePort: 80

curl ip/get -H "host: routenew.ccc.ccc" .return 404.

Environment

  • APISIX Ingress controller version (run apisix-ingress-controller version --long)
    apache/apisix-ingress-controller:1.8.0
  • Kubernetes cluster version (run kubectl version)
    Client Version: v1.29.2
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    Server Version: v1.28.8
  • OS version if running APISIX Ingress controller in a bare-metal environment (run uname -a)
    apache/apisix:3.8.1-debian
@lkad
Copy link
Author

lkad commented Dec 26, 2024

curl "http://10.204.222.7:9180/apisix/admin/plugin_configs/11a33fd6" -H "X-API-KEY: the route is added .
export ETCDCTL_ENDPOINTS=10.204.222.7:12379
export ETCDCTL_API=3
etcdctl get /apisix --prefix=true the route is add to etcd .but use curl it don't take effect return 404 .

@lkad
Copy link
Author

lkad commented Dec 26, 2024

I suspect that when etcdserver=true, it starts an in-memory etcd. After the ingress-controller restarts, the etcdserver data is lost. At the same time, after restart, the ingress-controller will batch refresh the routes in APISIX. However, since the route data in APISIX's memory is newer, the index or version refreshed to etcd is smaller, causing APISIX to not apply the latest routes

@lkad
Copy link
Author

lkad commented Dec 26, 2024

when i change apisixroute .
apisix add record to etcd,and notifiy to apisix,but apisix note apply to route

2024/12/26 07:06:27 [info] 56#56: *15617 [lua] config_etcd.lua:120: produce_res(): append res: {
  result = {
    events = { {
        kv = {
          create_revision = "1031",
          key = "/apisix/upstreams/5019bcf9",
          mod_revision = "1033",
          value = {
            desc = "Created by apisix-ingress-controller, DO NOT modify it manually",
            id = "5019bcf9",
            labels = {
              ["managed-by"] = "apisix-ingress-controller"
            },
            name = "testroute2_httpbin_80",
            nodes = { {
                host = "192.168.70.252",
                port = 80,
                weight = 100
              },
              <metatable> = <1>{}
            },
            scheme = "http",
            type = "roundrobin"
          }
        },
        prev_kv = {
          create_revision = "1031",
          key = "/apisix/upstreams/5019bcf9",
          mod_revision = "1031",
          value = {
            desc = "Created by apisix-ingress-controller, DO NOT modify it manually",
            id = "5019bcf9",
            labels = {
              ["managed-by"] = "apisix-ingress-controller"
            },
            name = "testroute2_httpbin_80",
            nodes = { {
                host = "192.168.70.252",
                port = 80,
                weight = 100
              },
              <metatable> = <table 1>
            },
            scheme = "http",
            type = "roundrobin"
          }
        }
      }, {
        kv = {
          create_revision = "1032",
          key = "/apisix/routes/48d337a6",
          mod_revision = "1034",
          value = {
            desc = "Created by apisix-ingress-controller, DO NOT modify it manually",
            hosts = { "routetest3.ccc.ccc",
              <metatable> = <table 1>
            },
            id = "48d337a6",
            labels = {
              ["managed-by"] = "apisix-ingress-controller"
            },
            name = "testroute2_test-route_route-1",
            plugin_config_id = "6058ee0d",
            plugins = {
              ["host-rate-limit"] = {
                capacity = 1000,
                interval = 1000,
                key = "http_test",
                key_type = "var",
                lock_enable = true,
                max_wait = 1000,
                quantum = 1000,
                rejected_code = 503
              }
            },
            upstream_id = "5019bcf9",
            uris = { "/*",
              <metatable> = <table 1>
            }
          }
        },
        prev_kv = {
          create_revision = "1032",
          key = "/apisix/routes/48d337a6",
          mod_revision = "1032",
          value = {
            desc = "Created by apisix-ingress-controller, DO NOT modify it manually",
            hosts = { "routetest2cccc.ccc.ccc",
              <metatable> = <table 1>
            },
            id = "48d337a6",
            labels = {
              ["managed-by"] = "apisix-ingress-controller"
            },
            name = "testroute2_test-route_route-1",
            plugin_config_id = "6058ee0d",
            plugins = {
              ["host-rate-limit"] = {
                capacity = 1000,
                interval = 1000,
                key = "http_test",
                key_type = "var",
                lock_enable = true,
                max_wait = 1000,
                quantum = 1000,
                rejected_code = 503
              }
            },
            upstream_id = "5019bcf9",
            uris = { "/*",
              <metatable> = <table 1>
            }
          }
        }
      } },
    header = {
      revision = "1034"
    },
    watch_id = "42"
  }
}, err: nil, context: ngx.timer
2024/12/26 07:06:31 [info] 51#51: *16770 client closed connection while waiting for request, client: 127.0.0.1, server: 0.0.0.0:9180

below function

-- append res to the queue and notify pending watchers
local function produce_res(res, err)
    if log_level >= NGX_INFO then
        log.info("append res: ", inspect(res), ", err: ", inspect(err))
    end
    insert_tab(watch_ctx.res, {res=res, err=err})
    for _, sema in pairs(watch_ctx.sema) do
        sema:post()
    end
    table.clear(watch_ctx.sema)
end

@lkad
Copy link
Author

lkad commented Dec 26, 2024

图片
i think is this code : when route etcd version < route in apsix mem version . not do update . but i am not sure

@lkad
Copy link
Author

lkad commented Jan 6, 2025

in apisix pod run this command

"sed 's/if tonumber(res.result.header.revision) > self.prev_index/if true or tonumber(res.result.header.revision) > self.prev_index/g' -i  /usr/local/apisix/apisix/core/config_etcd.lua"

then run :

 apisix restart 

then i kill apisix-ingress ,the route change take effect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant