Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make --keep-going work with drv that don't have a system #12168

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RossComputerGuy
Copy link
Member

Motivation

Fixes #8321, this resurrects #8775 and changes things a little differently.

Context

This adds a new build result of NoBuilders which implies that the derivation cannot be build because there are no builders. While a new error failure called BadSystem is added to imply the system configuration is bad or incompatible. These have similar meaning but imply different things which could make sense for future PR's. Currently, only a bad system error occurs when there are no builders which matches a derivation.


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added with-tests Issues related to testing. PRs with tests have some priority store Issues and pull requests concerning the Nix store labels Jan 10, 2025
@RossComputerGuy RossComputerGuy force-pushed the fix/keep-going-not-going branch 2 times, most recently from 42567d4 to d534b27 Compare January 10, 2025 04:03
@Ericson2314
Copy link
Member

The basic idea seems sound here :)

@RossComputerGuy
Copy link
Member Author

Heh yeah, I'm trying to diagnose why the tests fail. It seems like this PR is cursed with the same problem as the original attempt.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see why the test fails, and I don't know whether my suggestion would fix it, although probably not because I think that's for preventing false successes.

@@ -36,6 +36,7 @@ struct BuildResult
NotDeterministic,
ResolvesToAlreadyValid,
NoSubstituters,
NoBuilders,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NoBuilders,
NoCompatibleBuilder,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adding special handling to Hydra for this build result (see https://github.com/NixOS/hydra/blob/250668a19fa4d8ff9a6176ee6c44ca3003adedf1/src/hydra-queue-runner/build-remote.cc#L360-L413) to get a nicer build status in the UI than "aborted".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change. Yeah, having Hydra support this in the future would be nice. It could say something like "no compatible build machines".

@@ -61,6 +61,7 @@ MakeError(InvalidPath, Error);
MakeError(Unsupported, Error);
MakeError(SubstituteGone, Error);
MakeError(SubstituterDisabled, Error);
MakeError(BadSystem, Error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MakeError(BadSystem, Error);
MakeError(NoCompatibleBuilder, Error);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change.

Comment on lines +5 to +10
# Hack to get the scheduler to do what we want: The `good` derivation can
# only be built after `delay_good` (which takes a long time to build) while
# the others don't have any dependency.
# This means that if we build this with parallelism (`-j2`), then we can be
# reasonably sure that the failing derivations will be scheduled _before_ the
# `good` one (and so we can check that `--keep-going` works fine)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be synchronized by waiting for a file to appear in a mutable directory in extra-sandbox-paths. The test would have to monitor the CLI's log before creating it.
(doing it in the failing derivation would only decrease the length of the race condition, but not make it reliable)

Something along the lines of

nix build --extra-sandbox-paths "$TEST_ROOT/sync" 2>&1 \
  | while read ln; do
    echo "nix build: $ln"
    if echo "$ln" | grepQuiet ...; then
      touch "$TEST_ROOT/sync/failing-drv-failed"
    fi
  done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that might work? I plugged it in. It doesn't break the tests on my machine.


clearStore

# XXX: These invocations of nix-build should always return 100 according to the manpage, but often return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create an issue for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, @iFreilicht is the one who discovered the error. Probably could use more details before creating the issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's all I found out, I don't have more details for you 😬 But it basically happened during testing, so you should be able to reproduce that issue and can then create one with a proper log. This might also be why the tests are failing right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, alright. I did check the blame and it looks like @Ericson2314 did write some of the worker stuff around where I'm touching. Maybe John has some ideas into getting this to pass.

@RossComputerGuy RossComputerGuy force-pushed the fix/keep-going-not-going branch from 9e574a8 to f04f0bf Compare January 10, 2025 15:36
@RossComputerGuy RossComputerGuy force-pushed the fix/keep-going-not-going branch from f04f0bf to 5e27b27 Compare January 10, 2025 15:43
@edolstra
Copy link
Member

@RossComputerGuy Any idea why the test is failing?

@RossComputerGuy
Copy link
Member Author

No, I tried debugging it via gdb but breakpoints don't trigger with debug symbols. It also looks like the testing instructions in the documentation is out of date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
store Issues and pull requests concerning the Nix store with-tests Issues related to testing. PRs with tests have some priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--keep-going does not keep going on various build failures
5 participants