Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bundle: avoid closing file descriptor twice #1857

Open
wants to merge 1 commit into
base: maint-2.48
Choose a base branch
from

Conversation

dscho
Copy link
Member

@dscho dscho commented Jan 25, 2025

This is a really old bug. Deviating from Git's practice, I did not apply this on top of the bugged commit. In fact, I did not even apply this on top of an older maint-* branch because it would cause conflicts even cherry-picking it on top of maint-2.47.

Already when introduced in c7a8a16 (Add bundle transport,
2007-09-10), the `bundle` transport had a bug where it would open a file
descriptor to the bundle file and then close it _twice_: First, the file
descriptor (`data->fd`) is passed to `unbundle()`, which would use it as
the `stdin` of the `index-pack` process, which as a consequence would
close it via `start_command()`. However, `data->fd` would still hold the
numerical value of the file descriptor, and `close_bundle()` would see
that and happily close it again.

This seems not to have caused too many problems in almost two decades,
but I encountered a situation today where it _does_ cause problems: In
i686 variants of Git for Windows, it seems that file descriptors are
reused quickly after they have been closed.

In the particular scenario I faced, `git fetch <bundle> <ref>` gets the
same file descriptor value when opening the bundle file and importing
its embedded packfile (which implicitly closes the file descriptor) and
then when opening a pack file in `fetch_and_consume_refs()` while
looking up an object's header.

Later on, after the bundle has been imported (and the `close_bundle()`
function erroneously closes the file descriptor that has _already_ been
closed when using it as `stdin` for `git index-pack`), the same file
descriptor value has now been reused via `use_pack()`. Now, when either
the recursive fetch (which defaults to "on", unfortunately) or a
commit-graph update needs to `mmap()` the packfile, it fails due to a
now-invalid file descriptor that _should_ point to the pack file but
doesn't anymore.

To fix that, let's invalidate `data->fd` after calling `unbundle()`.
That way, `close_bundle()` does not close a file descriptor that may
have been reused for something different. While at it, document that
`unbundle()` closes the file descriptor, and ensure that it also does
that when failing to verify the bundle.

Luckily, this bug does not affect the bundle URI feature, it only
affects the `git fetch <bundle>` code path.

Note that this patch does not _completely_ clarifies who is responsible
to close that file descriptor, as `run_command()` may fail _without_
closing `cmd->in`. Addressing this issue thoroughly, however, would
require a rather thorough re-design of the `start_command()` and
`finish_command()` functionality to make it a lot less murky who is
responsible for what file descriptors.

At least this here patch is relatively easy to reason about, and
addresses a hard failure (`fatal: mmap: could not determine filesize`)
at the expense of leaking a file descriptor under very rare
circumstances in which `git fetch` would error out anyway.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho dscho self-assigned this Jan 25, 2025
dscho added a commit to git-for-windows/git-for-windows-automation that referenced this pull request Jan 25, 2025
As part of the snapshot builds via the `git-artifacts` workflow, we
try to import the tag from a bundle.

This uncovered a bug that is over 17 years old, and which I am trying
to fix via gitgitgadget/git#1857.

The symptom is that just after fetching the refs, when looking through
the fetched revisions whether a recursive fetch is needed, Git fails to
`mmap()` the newly-imported packfile. The symptom looks like this:

  From [...]/git.bundle
   * tag                     <tag> -> FETCH_HEAD
   * [new tag]               <tag> -> <tag>
  fatal: mmap: could not determine filesize

Curiously, this only happens on 32-bit Windows, not on 64-bit Windows,
nor on Linux or macOS. The explanation is most likely to be found in
how quickly file descriptor values are used again after closing them,
which would appear to be _really_ quickly on i686 Windows.

Be that as it may, to work around this issue, we simply avoid any
operation that would need to access the just-imported packfile in
`git fetch <bundle>`.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho
Copy link
Member Author

dscho commented Jan 25, 2025

/submit

Copy link

gitgitgadget bot commented Jan 25, 2025

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1857/dscho/fix-fetch-bundle-double-close-v1

To fetch this version to local tag pr-1857/dscho/fix-fetch-bundle-double-close-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1857/dscho/fix-fetch-bundle-double-close-v1

Copy link

gitgitgadget bot commented Jan 26, 2025

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <[email protected]>
writes:

> This seems not to have caused too many problems in almost two decades,
> but I encountered a situation today where it _does_ cause problems: In
> i686 variants of Git for Windows, it seems that file descriptors are
> reused quickly after they have been closed.

Nice finding.  It pays not to be in too uniform a monoculture.

>     This is a really old bug. Deviating from Git's practice, I did not apply
>     this on top of the bugged commit. In fact, I did not even apply this on
>     top of an older maint-* branch because it would cause conflicts even
>     cherry-picking it on top of maint-2.47.

Just for fun, attached is my backport to maint-2.44 track.  It
merges, with conflict which is rather trivial to resolve, back to
v2.48.1, on which your original patch is based, and gives the
merge result identical to the result of direct application of your
patch to v2.48.1.

I am not sure what to feel about the FD leak mentioned in the log
message, though.

Thanks, will queue.


--- >8 ---
From: Johannes Schindelin <[email protected]>
Date: Sat, 25 Jan 2025 23:57:36 +0000
Subject: [PATCH] bundle: avoid closing file descriptor twice

Already when introduced in c7a8a16239 (Add bundle transport,
2007-09-10), the `bundle` transport had a bug where it would open a file
descriptor to the bundle file and then close it _twice_: First, ...

... At least this here patch is relatively easy to reason about, and
addresses a hard failure (`fatal: mmap: could not determine filesize`)
at the expense of leaking a file descriptor under very rare
circumstances in which `git fetch` would error out anyway.

Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
---
 bundle.c    | 4 +++-
 bundle.h    | 2 ++
 transport.c | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

Obviously this is hard to test ;-)

diff --git a/bundle.c b/bundle.c
index a9744da255..7b3f2da40f 100644
--- a/bundle.c
+++ b/bundle.c
@@ -616,8 +616,10 @@ int unbundle(struct repository *r, struct bundle_header *header,
 {
 	struct child_process ip = CHILD_PROCESS_INIT;
 
-	if (verify_bundle(r, header, flags))
+	if (verify_bundle(r, header, flags)) {
+		close(bundle_fd);
 		return -1;
+	}
 
 	strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
 
diff --git a/bundle.h b/bundle.h
index 021adbdcbb..8b187be7cf 100644
--- a/bundle.h
+++ b/bundle.h
@@ -50,6 +50,8 @@ int verify_bundle(struct repository *r, struct bundle_header *header,
  *
  * Before unbundling, this method will call verify_bundle() with the
  * given 'flags'.
+ *
+ * Note that the `bundle_fd` will be closed as part of the operation.
  */
 int unbundle(struct repository *r, struct bundle_header *header,
 	     int bundle_fd, struct strvec *extra_index_pack_args,
diff --git a/transport.c b/transport.c
index df518ead70..25e2233da0 100644
--- a/transport.c
+++ b/transport.c
@@ -184,6 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
 		get_refs_from_bundle_inner(transport);
 	ret = unbundle(the_repository, &data->header, data->fd,
 		       &extra_index_pack_args, 0);
+	data->fd = -1; /* `unbundle()` closes the file descriptor */
 	transport->hash_algo = data->header.hash_algo;
 	return ret;
 }
-- 
2.48.1-293-gbb90fdfe3c

Copy link

gitgitgadget bot commented Jan 26, 2025

This patch series was integrated into seen via git@d69419e.

@gitgitgadget gitgitgadget bot added the seen label Jan 26, 2025
dscho added a commit to git-for-windows/git-for-windows-automation that referenced this pull request Jan 26, 2025
As part of the snapshot builds via the `git-artifacts` workflow, we
try to import the tag from a bundle.

This uncovered a bug that is over 17 years old, and which I am trying
to fix via gitgitgadget/git#1857.

The symptom is that just after fetching the refs, when looking through
the fetched revisions whether a recursive fetch is needed, Git fails to
`mmap()` the newly-imported packfile. The symptom looks like this:

  From [...]/git.bundle
   * tag                     <tag> -> FETCH_HEAD
   * [new tag]               <tag> -> <tag>
  fatal: mmap: could not determine filesize

Curiously, this only happens on 32-bit Windows, not on 64-bit Windows,
nor on Linux or macOS. The explanation is most likely to be found in
how quickly file descriptor values are used again after closing them,
which would appear to be _really_ quickly on i686 Windows.

Be that as it may, to work around this issue, we simply avoid any
operation that would need to access the just-imported packfile in
`git fetch <bundle>`.

Signed-off-by: Johannes Schindelin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant