authors | tags | |||
---|---|---|---|---|
|
|
While the R 4.0 migration has been functionally complete for quite a
while, the recent migration of r-java
and its dependents gives a good
opportunity to write a retrospective on the technical issues with
large-scale migrations in conda-forge
and how we solved them.
The R 4.0 migration rebuilt every package in conda-forge
that had
r-base
as a requirement, including more than 2200 feedstocks. A
migration of this size in conda-forge
faces several hurdles. First,
since every feedstock is a separate GitHub repository, one needs to
merge more 2200 pull requests (PRs). Second, conda-forge
's packages
on anaconda.org
are behind a CDN (content delivery network). This
service reduces web hosting costs for Anaconda Inc. but introduces an
approximately 30 minute delay from when a package is uploaded to
anaconda.org
and when it will appear as available using conda
from
the command line. Thus, even if the dependencies of a package have been
built, we have to wait until they appear on the CDN before we can
successfully issue the next PR and have it build correctly. Finally, the
existing bot and conda
infrastructure limited the throughput of the
migrations, due in part to the speed of the conda
solver.
Given the size of the R 4.0 migration, we took this opportunity to try
out a bunch of new technology to speed up large-scale migrations. The
main enhancements were using GitHub Actions to automerge PRs, using
mamba
to quickly check for solvability of package environments, and
enabling long-running migration jobs for the autotick bot. All told, the
bulk of the feedstocks for R 4.0 were rebuilt in less than a week, with
many PRs being merged in 30 minutes or less from when they were issued.
These enhancements to the autotick bot and conda-forge
infrastructure
can be used to enhance future migrations (e.g., Python 3.9) and reduce
maintenance burdens for feedstocks.
In a typical migration on conda-forge
, we issue a PR to a feedstock
and then ask the feedstock maintainers to make sure it passes and merge
it. In the case of the R 4.0 migration, the maintainers of R packages on
conda-forge
use a maintenance team (i.e., @conda-forge/r
) on the
vast majority of feedstocks. This team is small and so merging over 2000
PRs by hand is a big undertaking. Thus, with their permission, we added
the conda-forge
automerge functionality to all R feedstocks that they
maintain. The automerge bot, which relies on GitHub Actions, is able to
automatically merge any PR from the autotick bot that passes the recipe
linter, the continuous integration services, and has the special
[bot automerge]
slug in the PR title. This feature removed the
bottleneck of waiting for maintainers to merge PRs and reduced the
maintenance burden on the R maintenance team.
While being able to automatically merge PRs removed much of the work of performing the R 4.0 migration, it relied on the PR building correctly the first time it was issued. Due to the CDN delays and the build times of a package's dependencies, the dependencies of a package may not be immediately available after all of their migration PRs are merged. If the bot issued the packages migration PR before the dependents are available, the PR would fail with an unsolvable environment and have to be restarted manually. This failure would negate any of the benefits of using automerge in the first place.
To control for this edge case, we employed the mamba
package to check
for the solvability of a PR's environments before the PR was issued.
mamba
is a fast alternative to conda
that produces solutions for
environments orders of magnitude more quickly. Since, we have to perform
our checks of PR environments many times, an extremely fast solver was
essential for making the code efficient enough to run as part of the
autotick bot. We ended up using mamba to try to install the dependencies
for every variant produced by the feedstock to be migrated. With this
check in place, the autotick bot was able to issue migration PRs that
passed on the first try and were thus automatically merged, many within
30 minutes or less.
Finally, we made several upgrades to the autotick bot infrastructure to increase the uptime of the bot and its efficiency. First, we moved from an hourly cron job to a set of chained CI jobs. This change eliminated downtime between the runs of the bot. Second, we started to refactor the autotick bot from one monolithic piece of code into a distributed set of microservices which perform various independent tasks in parallel. These independent tasks, used for things like checking the statuses of previously issued PRs, are run separately allowing the bot to spend more time issuing PRs. Finally, we optimized the internal prioritization of the PRs to make sure the bot was spending more time on larger migrations where there is more work to do. More work on the autotick bot infrastructure, including work done by Vinicius Cerutti as part of the Google Summer of Code program, will further streamline the bot's operation.
Despite some initial hiccups with the bot infrastructure, the migration
ran quite smoothly for an endeavor of its size. The vast majority of
migration PRs were completed within a week from when we started, which
is a first for a migration of this size on conda-forge
. The largest
issue was solved recently, with the fixing of the openjdk
recipe and
the removal of aarch64
and ppc64le
builds from r-java
, enabling
the last large piece of the R ecosystem to be updated.
Looking forward, the improvements we made for the R 4.0 migration seem
broadly applicable to other migration tasks, including the yearly python
minor version bump. These kinds of large-scale migrations are
particularly suitable, since they usually involve few changes to the
feedstock itself and usually fail on CI when a broken package would be
produced. Faster migrations will help to provide the latest features to
downstream users and keep transition times to a minimum, helping to
foster greater stability of the ecosystem and the seamless experience
users have come to expect from conda-forge
.