-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When will large model frameworks be supported. deepspeed for example #1792
Comments
Can you add more info and update description? We love to add support for frameworks like Deepspeed and LLM examples. EBay are your thoughts? |
With the open source of deepspeed, More and more companies use deepspeed to train LLM。but deepspeed framework has some differences with pytorch. |
@PeterChg You might be interested in this: kubeflow/mpi-operator#549. |
Deepspeed supports various parallel launchers, such as pdsh (default, machines accessible via passwordless SSH), OpenMPI, slurm, and so on. The mpi-operator in the training operator is executed through kubectl exec, and it is uncertain whether Deepspeed can support it. Currently, using mpi v2 (via passwordless SSH) would be more appropriate. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it. |
/lifecycle frozen |
No description provided.
The text was updated successfully, but these errors were encountered: