Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Istio Ambient support #2676

Open
Tracked by #2763
peterj opened this issue Apr 11, 2024 · 27 comments · May be fixed by #2822
Open
Tracked by #2763

Istio Ambient support #2676

peterj opened this issue Apr 11, 2024 · 27 comments · May be fixed by #2822

Comments

@peterj
Copy link

peterj commented Apr 11, 2024

Istio Ambient mode is a different deployment model from the “traditional” (sidecar) mode of Istio. The ambient mode (sidecar-less) doesn’t require injecting sidecars into the deployments.

Here are the high-level differences between the two modes:

  • The concerns handled by the sidecar proxy in the sidecar Istio are split into two components in Istio ambient: 

    • Ztunnel (handles L4 concerns, mTLS, authorization policies without any HTTP)

      • Ztunnel is installed automatically when profile=ambient
    • Waypoint proxy (handles L7 concerns, i.e. traffic splitting, matching, header manipulation, etc., more or less everything that gets defined in the VirtualService)

      • Waypoint is optional (not installed by default) and it can be deployed per service account (handles all workloads using the same service account) or  per namespace (handles L7 proxying for all workloads in the namespace)  
  • Ingress gateway isn’t installed by default anymore when using profile=ambient

    • It might be worth migrating over to Kubernetes Gateway APIs and deploying the Istio ingress gateway like that, as we’ll have to use the Gateway APIs to deploy waypoint proxies anyway
  • Kubernetes Gateway API is used for ingress (and waypoint proxy) deployments

  • Any L7 VirtualServices or AuthorizationPolicies must have a “targetRef” section that specifies which waypoint proxy handles the L7 configuration

Waypoint proxies

Any VirtualService or AuthorizationPolicy that uses HTTP concepts will require a waypoint proxy. Given that there are 3 namespaces (that I identified so far), I’d suggest a per-namespace deployment of a waypoint proxy.

In addition to the waypoint proxy, the resources will have to be updated to use the waypoint proxies: 

Component Namespace Notes
dex auth
central-dashboard kubeflow
jupyter-web-app kubeflow
volumes-web-app kubeflow
katib-ui kubeflow
ml-pipeline-ui kubeflow
metadata-envoy-service kubeflow
kfp-tekton kubeflow
kubebench-dashboard kubeflow
profiles-kfam kubeflow
tensorboards-web-app-service kubeflow
kserve-models-web-app kserve

Work items

  • Migrate to Kubernetes Gateway API

    • Update the YAML (/common/istio*) for deploying ingress and local ingress to use the Gateway API 
  • Move to the latest Istio (Ambient will be beta in the next release (1.22))

    • We can still continue with the sidecar mode here
  • Move to Ambient mode

    • Identify components that need waypoint proxy

    • Deploy waypoint proxies 

    • Switch the Istio profile from default → ambient (or have an option of doing one or the other) - since Ambient will still be Beta, we shouldn’t make it a default option

@ca-scribner
Copy link
Contributor

What are the components that need to change for this? Would the profile controller need to create waypoint proxies for each user's namespace?

@peterj
Copy link
Author

peterj commented Jun 4, 2024

Waypoint proxies are automatically created when the Gateway resource gets created (so is a bit simpler than crafting deployments/services). You can configure it in such a way that there's 1 instance per namespace that handles all L7 traffic for that namespace.

@juliusvonkohout
Copy link
Member

@peterj one waypoint proxy per dynamic on demand namespace would be a critical change.

@juliusvonkohout juliusvonkohout linked a pull request Jul 28, 2024 that will close this issue
@juliusvonkohout juliusvonkohout linked a pull request Jul 28, 2024 that will close this issue
@peterj
Copy link
Author

peterj commented Jul 29, 2024

That's how the ambient is designed to handle L7. So instead of running sidecar next to every workload, you run 1 waypoint (L7) proxy per namespace.

@ca-scribner
Copy link
Contributor

Yeah I think the profile controller would be instantiating waypoint proxies for each user namespace. That shouldn't be so hard though (the profile controller already creates other per-namespace resources)

This probably gets more complicated wrt to the Gateway API though. Do we need to migrate all VirtualServices and such to the Gateway API too? I'm not clear on what is interoperable

@peterj
Copy link
Author

peterj commented Jul 30, 2024

The mixing of resources - VirtualService and Gateway API resources - is not supported in ambient. So, it's either Gateway API + HTTPRoute/TLSroute/TCPRoute or no Gateway API.

Is there an inventory of VirtualServices (and features within) currently used by kubeflow?

From the Istio docs, the following features are supported in HTTPRoutes (Gateway API):

  • matching on paths, headers
  • mirroring
  • weight-based routing
  • timeouts

@ca-scribner
Copy link
Contributor

Someone else will have to speak to it, but from what I can tell the features from VirtualServices used now are covered by HTTPRoute. I'm less sure about AuthorizationPolicies and things like authentication on the gateway - do you know the state of those with ambient istio+gateway apis?

@peterj
Copy link
Author

peterj commented Jul 30, 2024

The RequestAuthentication and AuthorizationPolicies can be used as-is. The only difference is that you have to use "targetRefs" instead of a selector label to point to the service. Here's an example from the docs:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: view-only
  namespace: default
spec:
 # THIS IS DIFFERENT- you're targeting a Gateway, instead of using labels.
  targetRefs:
  - kind: Gateway
    group: gateway.networking.k8s.io
    name: default
  action: ALLOW
  rules:
  - from:
    - source:
        namespaces: ["default", "istio-system"]
    to:
    - operation:
        methods: ["GET"]

And request auth:

apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
 name: "jwt-example"
 namespace: foo
spec:
 targetRef:
   kind: Gateway
   group: gateway.networking.k8s.io
   name: httpbin-gateway
 jwtRules:
 - issuer: "[email protected]"
   jwksUri: "https://raw.githubusercontent.com/istio/istio/release-1.22/security/tools/jwt/samples/jwks.json"

But, of course, it would be great to test these things out beforehand :) Is there a good walkthrough and a collection scenarios one can run through that touch these features? (I am not so familiar with kubeflow, but if someone points me to the scenarios, I could probably test it out with Gateway API)

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jul 31, 2024

In a PR you will just trigger many scenarios automatically, so that is the easiest way to test. Just check out all the GitHub workflows

@jbottum
Copy link

jbottum commented Jul 31, 2024

this is great work, thanks for moving this forward.

@ca-scribner
Copy link
Contributor

ty @peterj this is really helpful clarifications!

Do you know how the Gateway API works for setting up authentication at the gateway? I'm thinking of this EnvoyFilter - does it have an equivalent workflow for the new api?

@peterj
Copy link
Author

peterj commented Jul 31, 2024

I think there might be an easier way to define the ext-authz that doesn't use EnvoyFile -- https://istio.io/latest/docs/tasks/security/authorization/authz-custom/

I can see the oauth2-proxy is using the configuration above, but I am not sure why the oidc-service is using an EnvoyFilter... It should work with the same configuration I think

@peterj
Copy link
Author

peterj commented Jul 31, 2024

Also, to answer the original question - based on the docs here the targetRefs could be used to configure the Envoyfilter (but again, should be tested :))

@juliusvonkohout
Copy link
Member

oidc-authservice will be eliminated soon :-)

@kimwnasptd
Copy link
Member

kimwnasptd commented Aug 2, 2024

This is an amazing effort @peterj! I'd also like to help on the changes required.

Another important component here though is going to be KServe and Knative, that create a lot of VirtualServices under the hood. Knative specifically sounds very scary since we have no influence over, from the Kubeflow side.

cc @yuzisun

EDIT: Maybe the only way to start with this would be with RawDeployment mode of KServe, which doesn't require Istio. But we need to try it out. My concern would be with the Ingress (instead of GW) that needs istio to implement the IngressClass https://kserve.github.io/website/latest/admin/kubernetes_deployment/#1-install-istio

@peterj
Copy link
Author

peterj commented Aug 2, 2024

The Ingress resource still works with Istio (i.e. Istio implements the ingress class - ref), but it would make sense to move that over to Kubernetes Gateway API.

@ca-scribner
Copy link
Contributor

I haven't tested anything in knative yet, but they have some discussion of ambient working

Copy link

github-actions bot commented Oct 2, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@juliusvonkohout
Copy link
Member

/lifecycle frozen

@ca-scribner
Copy link
Contributor

Something that concerns me with ambient is that I can't figure out how to do deny-by-default. With sidecar, you can create an allow-nothing policy that forces all communication to need an Authorization Policy enabling it. But with ambient, I don't see a good way to do that.

Does anyone have a solution? It feels like a pretty big hole so I assume I'm missing something here

@edwardzjl
Copy link

I'm trying to deploy kubeflow with istio ambient mode but I find that it might be difficult to migrate the ext-auth: istio/istio#51214

@juliusvonkohout
Copy link
Member

I'm trying to deploy kubeflow with istio ambient mode but I find that it might be difficult to migrate the ext-auth: istio/istio#51214

We will try istio-cni first for rootless istio, so please help there as well #2907

@terrytangyuan
Copy link
Member

KServe community is working on Gateway API migration. See kserve/kserve#3952

@juliusvonkohout
Copy link
Member

@terrytangyuan we also want to switch our istio to the gateway API.

@juliusvonkohout
Copy link
Member

Some thoughts to keep in mind: ambient needs one L7 Proxy per namespace which prevents zero overhead namespaces.
If the namespace is idle, which is probably the case for 76% of the time (8x5=40 hours per week / 7*24) you have zero costs and zero overhead with istio-cni. So it is a tradeoff and if you have many namespaces with many pods (after removing the visualization server and artifact proxy) running 24/7 ambient mesh could be better. So I think we should offer both cni+ambient for the time being.

@juliusvonkohout
Copy link
Member

Lets also note that with Kubeflow 1.9.1 a lot has changed regarding istio and authentication and I am already running 1.9.1 with istio-cni on a few clusters.

@jbronn
Copy link

jbronn commented Jan 25, 2025

I've finished installing one of the first instances of Kubeflow 1.9.1 using Istio 1.24 (with CNI) in ambient mode. Here are some of my lessons:

  • In order for external authentication to work with Dex/OAuth2 Proxy the AuthorizationPolicy / RequestAuthentication resources must be associated with a Kubernetes Gateway resource that's in the same namespace as the ingress gateway deployment and allows traffic from other namespaces, e.g.:

    ---
    apiVersion: gateway.networking.k8s.io/v1
    kind: Gateway
    metadata:
      name: istio-ingressgateway
      namespace: istio-system
    spec:
      addresses:
        - type: Hostname
          value: istio-ingressgateway.istio-system.svc.cluster.local
      gatewayClassName: istio
      listeners:
        - allowedRoutes:
            namespaces:
              from: All
          name: http
          port: 80
          protocol: HTTP

    Setting the addresses pointing to the ingress gateway deployment is required for manual installation, otherwise Istio will automatically create a Deployment and Service. Then you must use replace the selector with a targetRef to the Kubernetes gateway for the AuthorizationPolicy / RequestAuthentication to actually apply to all ingress traffic:

    ---
    apiVersion: security.istio.io/v1
    kind: RequestAuthentication
    metadata:
      name: dex-jwt
      namespace: istio-system
    spec:
      targetRef:
        kind: Gateway
        group: gateway.networking.k8s.io
        name: istio-ingressgateway
    ---
    apiVersion: security.istio.io/v1
    kind: AuthorizationPolicy
    metadata:
      name: m2m-token-issuer
      namespace: istio-system
    spec:
      action: CUSTOM
      provider:
        name: oauth2-proxy
      targetRef:
        kind: Gateway
        group: gateway.networking.k8s.io
        name: istio-ingressgateway
  • Istio waypoint proxies are needed in Kubeflow namespace in order to process L7 authorization policies:

    apiVersion: gateway.networking.k8s.io/v1
    kind: Gateway
    metadata:
      name: istio-waypoint
      namespace: kubeflow
    spec:
      gatewayClassName: istio-waypoint
      listeners:
        - name: mesh
          port: 15008

    Services using L7 authorization policies need to be updated to use the waypoint, e.g. for ml-pipeline:

    apiVersion: v1
    kind: Service
    metadata:
      name: ml-pipeline
      namespace: kubeflow
      labels:
        istio.io/use-waypoint: istio-waypoint
        ...

    And authorization policies to be updated to remove the selector, add targetRef to applicaple service, and add the principal for the waypoint:

    apiVersion: security.istio.io/v1
    kind: AuthorizationPolicy
    metadata:
      name: ml-pipeline
      namespace: kubeflow
    spec:
      action: ALLOW
      rules:
        - from:
          - source:
              principals:
                - cluster.local/ns/kubeflow/sa/istio-waypoint
                ...
      targetRef:
        group: core
        kind: Service
        name: ml-pipeline
  • Likewise, for user namespaces it's easier to just add the istio.io/use-waypoint to the namespace by customizing the namespace-labels.yaml of the profiles controller. However, I had to patch the profile controller to not sync ns-owner-access-istio policies and install manually because I couldn't figure out how to update its modules. In particular, it uses Istio modules before the advent of ambient APIs like targetRef which are needed to attach authorization policies to user namespace waypoints. Updating the Istio module leads to conflicts with the old version of controller-runtime used that were beyond my Go abilities to resolve.

  • Istio DestinationRule resources with ISTIO_MUTUAL traffic policies (e.g, like one for volumes-web-app) do not work with ambient mode since L4 traffic is already encrypted with mTLS between workloads and need to be removed.

  • Knative PeerAuthentication cannot be used if Istio has a global PeerAuthentication policy set to STRICT until PeerAuthentication with port specification not overriding global default in ambient mode istio/istio#53884 is backported to 1.24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To Do
Development

Successfully merging a pull request may close this issue.

8 participants