Make --keep-going work with drv that don't have a system #12168

RossComputerGuy · 2025-01-10T02:49:10Z

Motivation

Fixes #8321, this resurrects #8775 and changes things a little differently.

Context

This adds a new build result of NoBuilders which implies that the derivation cannot be build because there are no builders. While a new error failure called BadSystem is added to imply the system configuration is bad or incompatible. These have similar meaning but imply different things which could make sense for future PR's. Currently, only a bad system error occurs when there are no builders which matches a derivation.

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

Ericson2314 · 2025-01-10T04:40:12Z

The basic idea seems sound here :)

RossComputerGuy · 2025-01-10T04:42:16Z

Heh yeah, I'm trying to diagnose why the tests fail. It seems like this PR is cursed with the same problem as the original attempt.

roberth

I don't really see why the test fails, and I don't know whether my suggestion would fix it, although probably not because I think that's for preventing false successes.

roberth · 2025-01-10T09:01:14Z

src/libstore/build-result.hh

@@ -36,6 +36,7 @@ struct BuildResult
        NotDeterministic,
        ResolvesToAlreadyValid,
        NoSubstituters,
+        NoBuilders,


Suggested change

NoBuilders,

NoCompatibleBuilder,

It may be worth adding special handling to Hydra for this build result (see https://github.com/NixOS/hydra/blob/250668a19fa4d8ff9a6176ee6c44ca3003adedf1/src/hydra-queue-runner/build-remote.cc#L360-L413) to get a nicer build status in the UI than "aborted".

I made the change. Yeah, having Hydra support this in the future would be nice. It could say something like "no compatible build machines".

roberth · 2025-01-10T09:02:15Z

src/libstore/store-api.hh

@@ -61,6 +61,7 @@ MakeError(InvalidPath, Error);
 MakeError(Unsupported, Error);
 MakeError(SubstituteGone, Error);
 MakeError(SubstituterDisabled, Error);
+MakeError(BadSystem, Error);


Suggested change

MakeError(BadSystem, Error);

MakeError(NoCompatibleBuilder, Error);

I made the change.

roberth · 2025-01-10T09:04:54Z

tests/functional/keep-going.nix

+  # Hack to get the scheduler to do what we want: The `good` derivation can
+  # only be built after `delay_good` (which takes a long time to build) while
+  # the others don't have any dependency.
+  # This means that if we build this with parallelism (`-j2`), then we can be
+  # reasonably sure that the failing derivations will be scheduled _before_ the
+  # `good` one (and so we can check that `--keep-going` works fine)


This could be synchronized by waiting for a file to appear in a mutable directory in extra-sandbox-paths. The test would have to monitor the CLI's log before creating it.
(doing it in the failing derivation would only decrease the length of the race condition, but not make it reliable)

Something along the lines of

nix build --extra-sandbox-paths "$TEST_ROOT/sync" 2>&1 \ | while read ln; do echo "nix build: $ln" if echo "$ln" | grepQuiet ...; then touch "$TEST_ROOT/sync/failing-drv-failed" fi done

Looks like that might work? I plugged it in. It doesn't break the tests on my machine.

edolstra · 2025-01-10T11:57:59Z

tests/functional/keep-going.sh

+
+clearStore
+
+# XXX: These invocations of nix-build should always return 100 according to the manpage, but often return 1


Can you create an issue for this?

Ok, @iFreilicht is the one who discovered the error. Probably could use more details before creating the issue?

Well, that's all I found out, I don't have more details for you 😬 But it basically happened during testing, so you should be able to reproduce that issue and can then create one with a proper log. This might also be why the tests are failing right now.

Oh, alright. I did check the blame and it looks like @Ericson2314 did write some of the worker stuff around where I'm touching. Maybe John has some ideas into getting this to pass.

edolstra · 2025-01-15T21:34:58Z

@RossComputerGuy Any idea why the test is failing?

RossComputerGuy · 2025-01-15T21:45:25Z

No, I tried debugging it via gdb but breakpoints don't trigger with debug symbols. It also looks like the testing instructions in the documentation is out of date.

RossComputerGuy requested review from Ericson2314 and edolstra as code owners January 10, 2025 02:49

github-actions bot added with-tests Issues related to testing. PRs with tests have some priority store Issues and pull requests concerning the Nix store labels Jan 10, 2025

RossComputerGuy mentioned this pull request Jan 10, 2025

--keep-going does not keep going on various build failures #8321

Open

RossComputerGuy force-pushed the fix/keep-going-not-going branch 2 times, most recently from 42567d4 to d534b27 Compare January 10, 2025 04:03

roberth reviewed Jan 10, 2025

View reviewed changes

edolstra reviewed Jan 10, 2025

View reviewed changes

RossComputerGuy force-pushed the fix/keep-going-not-going branch from 9e574a8 to f04f0bf Compare January 10, 2025 15:36

Make --keep-going work with drv that don't have a system

5e27b27

RossComputerGuy force-pushed the fix/keep-going-not-going branch from f04f0bf to 5e27b27 Compare January 10, 2025 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make --keep-going work with drv that don't have a system #12168

Make --keep-going work with drv that don't have a system #12168

RossComputerGuy commented Jan 10, 2025

Ericson2314 commented Jan 10, 2025

RossComputerGuy commented Jan 10, 2025

roberth left a comment

roberth Jan 10, 2025

edolstra Jan 10, 2025

RossComputerGuy Jan 10, 2025

roberth Jan 10, 2025

RossComputerGuy Jan 10, 2025

roberth Jan 10, 2025

RossComputerGuy Jan 10, 2025

edolstra Jan 10, 2025

RossComputerGuy Jan 10, 2025

iFreilicht Jan 10, 2025

RossComputerGuy Jan 10, 2025

edolstra commented Jan 15, 2025

RossComputerGuy commented Jan 15, 2025

	MakeError(BadSystem, Error);
	MakeError(NoCompatibleBuilder, Error);


		clearStore

		# XXX: These invocations of nix-build should always return 100 according to the manpage, but often return 1

Make --keep-going work with drv that don't have a system #12168

Are you sure you want to change the base?

Make --keep-going work with drv that don't have a system #12168

Conversation

RossComputerGuy commented Jan 10, 2025

Motivation

Context

Ericson2314 commented Jan 10, 2025

RossComputerGuy commented Jan 10, 2025

roberth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edolstra commented Jan 15, 2025

RossComputerGuy commented Jan 15, 2025