Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

init-dind-externals container slows down pod initialization #3818

Open
3 of 4 tasks
djahandarie opened this issue Nov 21, 2024 · 6 comments
Open
3 of 4 tasks

init-dind-externals container slows down pod initialization #3818

djahandarie opened this issue Nov 21, 2024 · 6 comments
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@djahandarie
Copy link

djahandarie commented Nov 21, 2024

Checks

Controller Version

0.9.3

Deployment Method

Other

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

Observe any pod created by ARC, especially on a node with lots of I/O going on. Look at kubectl describe pod for the pod. You will see that there is a considerable delay between "Started container init-dind-externals" and the line after it, which is due to the initi-dind-externals container taking time to run.

Describe the bug

On our cluster, we find that init-dind-externals container takes anywhere from 5s to multiple mins to run, depending on the I/O load of the machine it's running on. It seems to do a massive copy of tons of node modules etc. I'm not sure exactly how big of a copy this is but it seems rather large.

Describe the expected behavior

I would expect there is some more efficient way to do this, either in terms of populating the dind image with the necessary files beforehand, or at least reducing the number of files that need to be copied (like node modules...).

Additional Context

N/A

Controller Logs

N/A

Runner Pod Logs

  Normal  Started    3m43s  kubelet                                Started container init-dind-externals
  Normal  Pulled     12s    kubelet                                Container image "us-east4-docker.pkg.dev/kouzoh-github-actions-prod/runner-bases/debian-dind-sidecar:stable" already present on machine

(This is an example of the init-dind-externals container taking 3m31s total to run in order for the pod to initialize)

@djahandarie djahandarie added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Nov 21, 2024
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@laserpedro
Copy link

laserpedro commented Dec 12, 2024

Hello @djahandarie, I am facing the same exact issue: when the instance is under heavy I/O loads the container takes around 3 minutes to start. Out of curiosity did you try to run the kubernetes mode ?

@djahandarie
Copy link
Author

@laserpedro Unfortunately kubernetes mode is a bit hard to use for us:

  • many of our workflows use docker and this is not possible in kubernetes mode as there is no docker daemon.
  • it requires all jobs to be container jobs, so we would need to rewrite all of our thousands of workflows to be container jobs

If it's considered to be the future of ARC and superior for multiple reasons then it might make sense for us, but not so easy to make the switch that we'd do it for this sort of performance issue which can likely be solved with some optimizations anyways.

@laserpedro
Copy link

laserpedro commented Dec 12, 2024

Hello @djahandarie, ok understood. I have the same constraint regarding the usage of the dind mode. Have you tried some storage optimization (increase IOPS for instance)? I am looking at this currently. How many runners are your running and how are you hosting those runners ?

@moulougeta
Copy link

moulougeta commented Dec 17, 2024

Hey @djahandarie All the initContainer does is copy some node packages from the runner container to a volume, which in turn is mounted to the /home/runner/externals directory in the dind container. What you can do is build a custom dind image using a multi-stage approach, and you can copy these packages already when you build the image. Then, you can remove the init container altogether.

@JohnYoungers
Copy link

Hey @djahandarie All the initContainer does is copy some node packages from the runner container to a volume, which in turn is mounted to the /home/runner/externals directory in the dind container. What you can do is build a custom dind image using a multi-stage approach, and you can copy these packages already when you build the image. Then, you can remove the init container altogether.

We've been using this for a while now without issue (github tag will need to match your primary runner image):
#2944 (reply in thread)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

4 participants