-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(add) deepspeed_mpi specific container, deepspeed_config for MPI with nodetaints #549
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
- -bind-to | ||
- none | ||
- -map-by | ||
- slot | ||
- -x | ||
- NCCL_DEBUG=INFO | ||
- -x | ||
- LD_LIBRARY_PATH | ||
- -x | ||
- PATH | ||
- -mca | ||
- pml | ||
- ob1 | ||
- -mca | ||
- btl | ||
- ^openib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are all of these necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all strictly necessary however these options commonly used in both MPI workloads and MPI operator examples.
Do you think we need to remove these flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh, I left them as legacy from the very first examples I found for tensorflow and horovod, as I didn't know much about them.
But our basic MPI sample has almost no parameters https://github.com/kubeflow/mpi-operator/blob/master/examples/v2beta1/pi/pi.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you know enough to leave the bare basics, that would be better.
@@ -0,0 +1,31 @@ | |||
# Official PyTorch image with CUDA support | |||
FROM pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried using mpioperator/base
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tried yet, it could be better to use this mpioperator/base
image and just installing CUDA dependencies for DS with additional PyTorch / Tensorflow configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, probably better to use mpioperator/openmpi
. If you can make it work, that'd be great, as proof that the base images can be extended. I couldn't get tensorflow to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try to make it work for both and patch to the PR 👍
containers: | ||
# Container with the DeepSpeed training image built from the provided Dockerfile and the DeepSpeed support | ||
# Change your image name and version in here | ||
- image: <YOUR-DEEPSPEED-CONTAINER-NAME>:<VERSION> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just saying.
We can provide a sample image once #541 is completed.
DeepSpeed configures multi-node compute resources with hostfiles that are compatible with OpenMPI. A hostfile is a list of hostnames (or SSH aliases), which are machines accessible via passwordless SSH. Do we need to support Deepspeed's own parallel launcher (via pdsh) in mpi-operator? The difference is that the default path for the hostfile in Deepspeed is /job/hostfile. Therefore, if the operator can generate /job/hostfile (like horovod discover_hosts.sh), it can support Deepspeed's own parallel launcher. Ref: https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node @dtunai @alculquicondor @tenzen-y WDYT? |
Through the above document, I don't think we need to generate the hostile in /job/hostfile since the users can set the hostile path via the deepspeed command, and the deepspeed uses the same format with OpenMPI for the hostile. IIRC, we generate discover_hosts.sh for the horovod since the horovod uses a different format than OpenMPI for the discovery.sh. |
Let me know if I'm missing any other. |
Your understanding is correct. Currently, DeepSpeed supports the following three forms:
Therefore, if we need to support the second and third forms in mpi-operator, perhaps we can remind users in the document that they must set --hostfile=/etc/mpi/hostfile? I would like to add a new example to do this. |
Thank you for clarifying. @alculquicondor wdyt? |
Is there a way to specify the hostfile via environment variable? That's how we do it for mpirun. Are there any changes required to have pdsh work? |
I couldn't find an environment variable to specify the hostfile(like 'OMPI_MCA_orte_default_hostfile') in the Deepspeed document.
The secret that contains SSH keys is necessary, and PDSH also accesses workers via passwordless SSH. |
I guess we can move forward with this PR and provide another example using |
@simulark Why did you close this PR? |
Hello @tenzen-y. |
Oh, I see. Thank you for letting me know! |
This MR introduces an integration example of DeepSpeed, a distributed training library, with Kubeflow to the main mpi-operator examples. The objective of this example is to enhance the efficiency and performance of distributed training jobs by harnessing the combined capabilities of DeepSpeed and MPI. Comments in configuration explains the use of taints and tolerations in the Kubernetes configuration to ensure the proper scheduling of DeepSpeed worker pods on nodes with specific resources, such as GPUs.