Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update user guide for namespace label in Katib Python SDK example #3976

Conversation

Doris-xm
Copy link
Contributor

@Doris-xm Doris-xm commented Jan 24, 2025

What this PR does:

Before creating Katib Experiments, the label for metrics collector injection katib.kubeflow.org/metrics-collector-injection: enabled should be presented in the namespace.

This PR updates the user guide to inform users of the effect of this label.

Which issue(s) this PR fixes:
Fixes kubeflow/katib#2499

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -17,6 +17,31 @@ You need to install the following Katib components to run examples:

## Getting Started with Katib Python SDK

### Adding Namespace Label for the Metrics Collector Injection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that we add this section to the Katib Metrics Collector guide for now: https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/
The getting started example, tell users to run Experiment in kubeflow namespace, which has the required label in default standalone installation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed something strange. When I previously ran the get started example with the kubeflow namespace, I encountered the "namespace lacks label" error. However, I just tried it again, and now it works fine with the kubeflow namespace.

On the other hand, when I use the API to run LLM hyperparameter optimization using kubeflow namespace too, the "namespace lacks label" error still occurs.

@andreyvelich Do you have any idea what might be causing this issue? I’ll continue investigating as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you install Katib and Training Operator when you run LLM hyperparameter optimization using kubeflow namespace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the following commands from the installation page to install the latest changes of Katib control plane and Training Operator control plane:
kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=master"

kubectl apply --server-side -k "github.com/kubeflow/training-operator.git/manifests/overlays/standalone?ref=master"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, when you deploy Training Operator after the Katib, is might override the namespace label, since it doesn't have it: https://github.com/kubeflow/training-operator/blob/master/manifests/overlays/standalone/namespace.yaml
You should install Training Operator before Katib ideally.
Also, to use PyTorchJob CRD within Katib Trial, it needs to be exist before Katib Controller starts: https://github.com/kubeflow/katib/blob/master/pkg/controller.v1beta1/trial/trial_controller.go#L121-L132

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see! Problem fixed! Thank you so much

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that we add this section to the Katib Metrics Collector guide for now: https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/
The getting started example, tell users to run Experiment in kubeflow namespace, which has the required label in default standalone installation.

SGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich @Electronic-Waste should we add a pre-req section on https://www.kubeflow.org/docs/components/katib/user-guides/metrics-collector/ to include this step?

SGTM. We could also declare that kubeflow namespace has already been configured with this label, that's why they can use tune function in kubeflow namespace without any prerequisite setup.

Such as "If you want to use pull-based metrics collector, you need to attach the label xxx to the target namespace (kubeflow namespace has built-in support for this label)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. I've updated the doc:

  • move the detailed guidance to Metrics Collector
  • add a Note in Getting Started to explain kubeflow namespace has built-in support. I also include a link to Metrics Collector for detailed instruments.

Please let me know if you have any suggestions.

@@ -17,6 +17,31 @@ You need to install the following Katib components to run examples:

## Getting Started with Katib Python SDK

### Adding Namespace Label for the Metrics Collector Injection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure the namespace label `katib.kubeflow.org/metrics-collector-injection: enabled`
is present. This label enables the sidecar container injection to collect metrics during the experiment.

You can configure the namespace using the following YAML:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the user needs to update the namespace, no create a new one. I'd update the sentence by
"You can configure the namespace by adding the following label katib.kubeflow.org/metrics-collector-injection: enabled as is shown in the sample code."

@varodrig
Copy link
Contributor

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Jan 25, 2025
Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM! Just one comment for you @Doris-xm!

@google-oss-prow google-oss-prow bot removed the lgtm label Jan 26, 2025
Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for your great contribution @Doris-xm!

/lgtm
/assign @kubeflow/wg-automl-leads @helenxie-bit

@google-oss-prow google-oss-prow bot added the lgtm label Jan 26, 2025
@helenxie-bit
Copy link
Contributor

Thanks for the contribution!

/lgtm

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution @Doris-xm!
/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, Electronic-Waste

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 2a5ef22 into kubeflow:master Jan 26, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Doc] Update user guide for namespace label
6 participants