-
Notifications
You must be signed in to change notification settings - Fork 791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update user guide for namespace label in Katib Python SDK example #3976
Update user guide for namespace label in Katib Python SDK example #3976
Conversation
…Started Signed-off-by: Xinmin Du <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @Doris-xm!
/assign @hbelmiro @varodrig @helenxie-bit @Electronic-Waste
@@ -17,6 +17,31 @@ You need to install the following Katib components to run examples: | |||
|
|||
## Getting Started with Katib Python SDK | |||
|
|||
### Adding Namespace Label for the Metrics Collector Injection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that we add this section to the Katib Metrics Collector guide for now: https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/
The getting started example, tell users to run Experiment in kubeflow
namespace, which has the required label in default standalone installation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed something strange. When I previously ran the get started example with the kubeflow
namespace, I encountered the "namespace lacks label" error. However, I just tried it again, and now it works fine with the kubeflow
namespace.
On the other hand, when I use the API to run LLM hyperparameter optimization using kubeflow
namespace too, the "namespace lacks label" error still occurs.
@andreyvelich Do you have any idea what might be causing this issue? I’ll continue investigating as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you install Katib and Training Operator when you run LLM hyperparameter optimization using kubeflow
namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the following commands from the installation page to install the latest changes of Katib control plane and Training Operator control plane:
kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=master"
kubectl apply --server-side -k "github.com/kubeflow/training-operator.git/manifests/overlays/standalone?ref=master"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, when you deploy Training Operator after the Katib, is might override the namespace label, since it doesn't have it: https://github.com/kubeflow/training-operator/blob/master/manifests/overlays/standalone/namespace.yaml
You should install Training Operator before Katib ideally.
Also, to use PyTorchJob CRD within Katib Trial, it needs to be exist before Katib Controller starts: https://github.com/kubeflow/katib/blob/master/pkg/controller.v1beta1/trial/trial_controller.go#L121-L132
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see! Problem fixed! Thank you so much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that we add this section to the Katib Metrics Collector guide for now: https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/
The getting started example, tell users to run Experiment in kubeflow namespace, which has the required label in default standalone installation.
SGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich @Electronic-Waste should we add a pre-req section on https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/ to include this step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich @Electronic-Waste should we add a pre-req section on https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/ to include this step?
SGTM. We could also declare that kubeflow
namespace has already been configured with this label, that's why they can use tune
function in kubeflow
namespace without any prerequisite setup.
Such as "If you want to use pull-based metrics collector, you need to attach the label xxx
to the target namespace (kubeflow
namespace has built-in support for this label)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. I've updated the doc:
- move the detailed guidance to Metrics Collector
- add a Note in Getting Started to explain
kubeflow
namespace has built-in support. I also include a link to Metrics Collector for detailed instruments.
Please let me know if you have any suggestions.
@@ -17,6 +17,31 @@ You need to install the following Katib components to run examples: | |||
|
|||
## Getting Started with Katib Python SDK | |||
|
|||
### Adding Namespace Label for the Metrics Collector Injection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich @Electronic-Waste should we add a pre-req section on https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/ to include this step?
ensure the namespace label `katib.kubeflow.org/metrics-collector-injection: enabled` | ||
is present. This label enables the sidecar container injection to collect metrics during the experiment. | ||
|
||
You can configure the namespace using the following YAML: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the user needs to update the namespace, no create a new one. I'd update the sentence by
"You can configure the namespace by adding the following label katib.kubeflow.org/metrics-collector-injection: enabled as is shown in the sample code."
…tart Signed-off-by: Xinmin Du <[email protected]>
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM! Just one comment for you @Doris-xm!
content/en/docs/components/katib/user-guides/metrics-collector.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Shao Wang <[email protected]> Signed-off-by: Du Xinmin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for your great contribution @Doris-xm!
/lgtm
/assign @kubeflow/wg-automl-leads @helenxie-bit
Thanks for the contribution! /lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution @Doris-xm!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich, Electronic-Waste The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does:
Before creating Katib Experiments, the label for metrics collector injection
katib.kubeflow.org/metrics-collector-injection: enabled
should be presented in the namespace.This PR updates the user guide to inform users of the effect of this label.
Which issue(s) this PR fixes:
Fixes kubeflow/katib#2499