Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commit may fail when container write many small files (failed to create diff tar stream: failed to copy: archive/tar: write too long) #3791

Open
abel-von opened this issue Dec 26, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@abel-von
Copy link

Description

Currently we got an error occasionally in our product environment, when we commit a container to an image.

failed to create diff: mount callback failed on /var/lib/containerd/tmpmounts/containerd-mount260055902: mount callback failed on /var/lib/containerd/tmpmounts/containerd-mount4247854495: failed to write compressed diff: failed to create diff tar stream: failed to copy: archive/tar: write too long: unknown

It's more likely to occur when writing a large number of small files in a container. I think this maybe related to buffer io, that the file content is changed during tar write it. There is a very low probability that the problem recurs.

func (fw *regFileWriter) Write(b []byte) (n int, err error) {
	overwrite := int64(len(b)) > fw.nb
	if overwrite {
		b = b[:fw.nb]
	}
	if len(b) > 0 {
		n, err = fw.w.Write(b)
		fw.nb -= int64(n)
	}
	switch {
	case err != nil:
		return n, err
	case overwrite:
		return n, ErrWriteTooLong
	default:
		return n, nil
	}
}

Steps to reproduce the issue

  1. start a container to write many small files
  2. nerdctl commit this container
  3. repeat it thousands of times.

Describe the results you received and expected

If this is related to the buffer io, I think maybe the only way to avoid this is to add a sleep after Pause so that kernel can flush all data into disk.

What version of nerdctl are you using?

nerdctl version 2.0.1-2-g1f812253

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

No response

@abel-von abel-von added the kind/unconfirmed-bug-claim Unconfirmed bug claim label Dec 26, 2024
@AkihiroSuda AkihiroSuda added bug Something isn't working and removed kind/unconfirmed-bug-claim Unconfirmed bug claim labels Dec 27, 2024
@AkihiroSuda AkihiroSuda changed the title Commit may fail when container write many small files Commit may fail when container write many small files (failed to create diff tar stream: failed to copy: archive/tar: write too long) Dec 27, 2024
@abel-von
Copy link
Author

Shall we add a call of sync command before call Creatediff ? @AkihiroSuda

@fahedouch
Copy link
Member

Hi @abel-von,

Yes, a more deterministic way to fix the issue of ensuring all data is flushed to disk before committing the container involves explicitly synchronizing the filesystem. Instead of relying on a sleep, which is not guaranteed to be accurate, you can use system calls to ensure all data is written to disk. This approach is more reliable and precise.

are planning to open a PR ?

@abel-von
Copy link
Author

@fahedouch Thanks for reply, I am hesitating if we call syscall.Sync(), or do we get the rootfs path of the container, and call "unix.Syncfs" with the rootfs directory path. There seems no easy way to get the path of the rootfs path of a container, considering different kinds of snapshotters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants