-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Knative #1682
base: main
Are you sure you want to change the base?
Add support for Knative #1682
Conversation
Signed-off-by: Thomas Banks <[email protected]>
(I haven't used flagger / knative in awhile but) I'm very excited to see this taking shape! |
Hi @tombanksme! This looks great, are there any news on this PR? |
I haven't heard anything back from the Flagger maintainers. I would be happy to finish up the PR if it's something that fluxcd would consider merging. |
while i see the value added by this PR, is it not possible to use Gateway API as a bridge to get Flagger and Knative to work together: https://github.com/knative-extensions/net-gateway-api |
Sorry for the delay. It looks like net-gateway-api isn't ready for production yet according to the readme |
I'll help craft a detailed technical response explaining why the Gateway API approach wouldn't work for Flagger-Knative integration: Thank you for bringing this up. As someone familiar with Knative, I'd like to clarify why using Gateway API as a bridge between Flagger and Knative wouldn't be possible, and why the current PR approach is actually the correct solution. The net-gateway-api project has a different purpose - it's not meant to be an control interface for Knative, but rather it allows Knative to use Gateway API as its final networking output. Let me explain how Knative's networking architecture works:
Therefore, while the suggestion to use Gateway API as a bridge is interesting, it wouldn't provide the deep integration needed for proper traffic management. The current PR's approach of integrating directly with KService is actually the only correct way to achieve this integration - there are no alternative approaches in the current Knative roadmap that would provide the same level of proper integration. I've reviewed the implementation in the PR, and it aligns perfectly with Knative's architectural principles and control patterns. Reference: https://knative.dev/docs/serving/architecture/#traffic-flow-and-dns |
i did a quick first pass and the approach looks good. just to confirm, both the user and flagger own the Knative furthermore, can you add some preliminary docs? or share the steps you took to test this change? |
Yes, We share one Knative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Knative Serving Lead here 👋. Awesome to see a change like this 🎉 .
Just some comments but I'm not a flagger expert so I'll defer to others.
@@ -27,6 +27,7 @@ require ( | |||
k8s.io/client-go v0.30.1 | |||
k8s.io/code-generator v0.30.1 | |||
k8s.io/klog/v2 v2.120.1 | |||
knative.dev/serving v0.41.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note:
v0.43.0 tag is out now (v1.16)
v0.44.0 tag will be out next week (v1.17)
Though the API definitions are stable so you should be fine not using the latest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! 🥳
I'll get these updated
|
||
switch kind { | ||
case "DaemonSet": | ||
switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: dunno what coding style this project has but i'd just switch on kind
and maybe have a nested if for Service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that's my bad. I was very new to go when I wrote this. I'll get it cleaned up
return true, fmt.Errorf("service %s.%s get query error: %w", cd.Spec.TargetRef.Name, cd.Namespace, err) | ||
} | ||
|
||
revision, err := kc.knativeClient.ServingV1().Revisions(cd.Namespace).Get(context.TODO(), service.Status.LatestCreatedRevisionName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not a flagger expert)
I think you'd want to guard against LatestCreatedRevisionName
being equal to the value of annotations[flagger.app/primary-revision]
- this would ensure you wait for a new revision to be created which becomes your canary.
Otherwise I'm thinking if there was ever a delay in the knative reconcilers it'll look at the wrong revision here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: revisions have a configurationGeneration
label
} | ||
|
||
func (kc *KnativeController) GetMetadata(canary *flaggerv1.Canary) (string, string, map[string]int32, error) { | ||
// TODO: Do we need this for Knative? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what does flagger use this for?
} | ||
|
||
func (kc *KnativeController) ScaleToZero(canary *flaggerv1.Canary) error { | ||
// Not Implemented: Not needed for Knative deployments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious what this does for other providers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other providers flagger maintains two instances of the application. From memory the process looks like this:
- When a new version is created update the canary & scale it up
- Slowly send traffic from the primary to the canary & monitor
- Once the canary reaches 100% & is performing as expected, update the primary
- Switch 100% traffic over to the primary
- Scale the canary back to zero
return true, fmt.Errorf("service %s.%s get query error: %w", cd.Spec.TargetRef.Name, cd.Namespace, err) | ||
} | ||
|
||
return hasSpecChanged(cd, service.Status.LatestCreatedRevisionName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not a flagger dev)
What 'applies' the new Service spec.template
that would trigger the rollout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the process that I'd envisioned:
- User creates a Knative service for their application
- User creates a Flagger configuration pointing at the Knative service
- Flagger takes over the traffic management of the Knative service
4.a. It adds the metadata it needs to operate
4.b. Wasn't done in this original pull request but it can also probably set the default traffic routing to zero for all revisions & manually set the latest to 100% - When the user wants to update an application they update the Knative service like they would usually
- Flagger watches for the new revision & then does the traffic management changeover
So the user is still responsible for triggering the rollout like they would with any other Knative service here
- apiGroups: | ||
- serving.knative.dev | ||
resources: | ||
- services |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just an FYI and this is less common)
You can have a Knative Route and point it to revisions from different knative Configurations. Knative Services are a convenience over it.
I doubt anyone needs that kind of flexibility. Tackling Services rollout will probably cover 99% of use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look at this. If it's easy to add I'll do it in this pull request; if not I'll spin it out into another PR
return | ||
} | ||
|
||
return int(*service.Status.Traffic[primaryRevisionIdx].Percent), int(*service.Status.Traffic[canaryRevisionIdx].Percent), false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsure when this method is called but in general Knative Services's percent can be nil
Morning folks. Thank you all for taking a look & investing your time into this. I'll allocate some time over the next couple of weeks to get this cleaned up as promised. |
This is a proof-of-concept pull request to add the canary release deployment strategy for Knative (#903). It's far from production ready but I wanted to see if there's appetite for this feature before I invest more time into completing it.
Currently Flagger can target a Knative service & complete the canary release process. I've not tested rollbacks yet although I think it should just work. The pull request needs some extra work to add test coverage; throw errors when using unsupported release processes & add Knative specific Kubernetes events.
Let me know what you think!
Example
The following canary will target a Knative Service. Once the canary has been initialised you can start the canary release process by creating a new revision of the service.