Feature request: reduce memory consumption #35

MichaelDietzel · 2024-12-18T23:54:15Z

First I wanted to say thanks for creating this new implementation of borghash in this new dedicated repo!

After reading through some of the code I had a few ideas how the memory consumption could be reduced further. Maybe some of this is already implemented and I missed it, but some of it should be new:

1:
The reallocs in _resize_kv could probably be optimized. That way the peak memory consumption could be reduced which I think would be especially nice for systems facing memory pressure (or maybe even worse address space pressure as for example on 32bit systems). The most compatible idea would probably be the following:

Allocate the memory using mmap. Either by using /dev/zero as described here or by mapping the actual stored file on the disk as described here (probably more complicated/error-prone). Depending on how much address space is available, you could just reserve all the space you could ever need (UINT32_MAX * self.ksize * sizeof(uint8_t)). This would get rid of the need for the resize/realloc completely. In case you do not have that much address space available (for example on 32bit systems), you could do something else to replace realloc: still use mmap, but copy from the old to the new table in blocks. Copy 10MB from the old table to the new table and munmap it in the old table.
I just realize there are some tests. Maybe I could implement this myself! I like having the hash table in a separate repo so this should be a lot easier/less scary than before!

2:
Do I understand correctly that, together with the changes from borgstore, borghash will only need to be used when writing to the repo? I imagine the objects/chunks on disk can be found just by their ID which will be used for anything that is stored on disk. That way when just reading the repo, borghash would not need to be used/loaded at all.
Otherwise there still could be some readonly optimizations:

implement a readonly mode that only loads what is needed. So maybe the hashtable is unneeded or only the values are needed but the keys are not?
only allocate exactly what is needed and do not keep any additional room for growth.

3:
Some more thoughts on borgstore, that are only relevant if the IDs from borghash are used to locate the objects/chunks on disk:
3.1 The borgstore documentation mentions that a random distribution of the keys is assumed, which would not be useful for this use case. This assumption is now outdated, right?
3.2 I like using custom chunker params for smaller chunk sizes on my repos. Benchmarks for my use cases showed a smaller repo size that way (Although later I realized that this increases memory consumption...). For that reason I dislike the change that each chunk gets its own file.
I'd rather prefer if for example 256 chunks/objects would be stored in a single file. This would reduce file system overhead as well as random I/O when writing and especially reading. Even more so if the files would not be stored in the order they were created (which could also be caused by copying them over to a new file system). Combining these writes into one file for multiple chunks should be a lot easier when having consecutive IDs. Although this would obviously make deletion more complicated. If my assumptions that files are stored by their consecutive IDs, I will create an issue with more detailed ideas over at the borgstore repo.

Thanks for reading all of this :-)

ThomasWaldmann · 2024-12-19T08:30:41Z

Maybe let's focus on (1) here and move the other items to separate issues on the best-matching project.

Yeah, guess I see what you mean. I'ld just like to avoid getting into platform compatibility issues or requiring complex platform specific code. borg master branch shall run on linux, bsd, macOS, windows at least. some people have also been using borg on android.

Currently, the disk layout is rather different than the memory layout, so a direct map is not possible as it is. Maybe we don't want to change that, there is still the idea that it should be easy to add/remove value tuple elements.

32bit platforms: I currently would like to keep support for these right now, but there might come the time when we can just drop them.

MichaelDietzel · 2024-12-19T22:26:58Z

Thanks for your reply!

In the meantime I realized I should take a closer look at what realloc really does in the background, as there are smart people implementing this stuff and they probably had a similar idea to optimize this. And in fact they did. I only checked for linux but for large allocations libc already uses mmap. And this is handled in the kernel by just getting an new address range and then remapping the old pages to the new locations. So there is not only the double amount of memory allocated at some point during realloc, even the copy is optimized away. So my implementation would probably have made it worse.

So there would only be one thing that probably could be improved: realloc still needs continuous free addresses to work, which could be difficult on 32bit systems. That could be optimized by allocating multiple smaller buffers and looking up which one to access for each index, but I do not really like the idea. It would only be relevant for 32bit platforms when the repository grows to the limit of that platform. On the other hand for 64bit platforms it would be slower and more error prone due to being more complex. So I think it could be better to not implement this.

In conclusion: Opening this issue for 1 was rather pointless. Maybe I should just add a comment to the realloc to let other people looking at this know that it is already quite optimized and should not cause the memory usage to peak. Maybe the documentation should reflect that as well. If I understand the documentation for the current stable correctly, the calculation of the memory consumption is based on the assumption that growing some internal data structures causes memory consumption peaks due to double allocation + copy.

ThomasWaldmann · 2024-12-20T00:29:58Z

libc realloc: interesting findings! more/better than what is documented, iirc.

the current stable borgbackup docs is for borg 1.x, which does not use borghash. only borg2 / master branch does.

yes, it would be good to have a comment in the borghash src about what's already known as "optimized elsewhere".

MichaelDietzel linked a pull request Jan 2, 2025 that will close this issue

added comment: realloc is already optimized #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: reduce memory consumption #35

Feature request: reduce memory consumption #35

MichaelDietzel commented Dec 18, 2024

ThomasWaldmann commented Dec 19, 2024 •

edited

Loading

MichaelDietzel commented Dec 19, 2024

ThomasWaldmann commented Dec 20, 2024 •

edited

Loading

Feature request: reduce memory consumption #35

Feature request: reduce memory consumption #35

Comments

MichaelDietzel commented Dec 18, 2024

ThomasWaldmann commented Dec 19, 2024 • edited Loading

MichaelDietzel commented Dec 19, 2024

ThomasWaldmann commented Dec 20, 2024 • edited Loading

ThomasWaldmann commented Dec 19, 2024 •

edited

Loading

ThomasWaldmann commented Dec 20, 2024 •

edited

Loading