Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New MultipartParser fails on files above 102400 bytes #1470

Open
SethurBlackcoat opened this issue Dec 13, 2024 · 4 comments
Open

New MultipartParser fails on files above 102400 bytes #1470

SethurBlackcoat opened this issue Dec 13, 2024 · 4 comments

Comments

@SethurBlackcoat
Copy link

Affected versions: 0.13+ (tested in 0.13.2, issue is still present in code for 0.14)
Works in 0.12.25

The new MultipartParser released in Version 0.13 raises a MultipartError("Memory limit reached.") when the total file size exceeds 102400 bytes. This is due to MEMFILE_MAX being passed in (possibly mistakenly?) as the memory limit on line 1353.

image

The default value for this parameter would be 2 ** 20 i.e. ~1mb, which is still fairly low but more reasonable than 100kb. Ideally though this should be either dependent on actual memory available or configurable, perhaps via a kwarg to run().

@SethurBlackcoat
Copy link
Author

SethurBlackcoat commented Dec 13, 2024

Looking at the code in a bit more detail, I believe (but haven't tested this) that the error would not occur if the upload consisted of a single file exceeding MEMFILE_MAX - which would instead cause it to become a NamedTemporaryFile and thus count against the much larger disk_limit - but rather only if you have multiple smaller files, each individually smaller than memfile_limit, that when taken together exceed mem_limit.

@defnull
Copy link
Member

defnull commented Dec 13, 2024

The MEMFILE_MAX limit was originally the maximum number of bytes Bottle would read from the body in order to populate in-memory data structures (e.g. json or form fields minus file uploads). The intention is to protect apps from MemoryError. It was only loosely applied in 0.12 and had no effect on multipart because cgi.FieldStorage had it's own mechanism for that. There were many was to bypass this safeguard :/

With the new parser, it's now enforced by the parser itself. The idea is that the sum of all in-memory buffers must not exceed this limit. This is not ideal yet because the limit is used for two different things: As a soft-limit of individual memory-buffered files before they are spooled to disk, and as the total hard-limit for memory consumption for all fields and files. Multiple small files that each stay below the limit can, in total, trigger the hard limit and cause an error.

We should probably introduce more fine grained control here, and expose those settings to app developers. And also improve documentation. Limiting request parsers is important, but it should be possible to change the limits easily, if needed.

@SethurBlackcoat
Copy link
Author

Fully agree with everything above. As a stopgap measure until a longer-term fix is implemented, it would probably help to stick with the default value of 1mb for the mem_limit instead of passing in MEMFILE_MAX for it.

Long term, making mem_limit (and probably memfile_limit) configurable and having the behavior documented for file uploads would definitely be desirable, but thinking about it I realized even that would only delay the problem to a (now developer-selectable) point in the future. The most elegant solution I can come up with would be to not have exceeding mem_limit be an instant abort of the parse, but instead just force all subsequent files in the parse to be stored on disk instead of being memory buffered. I think that's a practical fallback that prevents memory exhaustion while still allowing uploads with numerous smaller files.

@defnull
Copy link
Member

defnull commented Dec 13, 2024

but instead just force all subsequent files in the parse to be stored on disk instead of being memory buffered.

Yes, that's an idea, but there is another problem. Bottle treats text fields that exceed MEMFILE_MAX as file uploads, they are not returned as strings (and not contained in request.fields) but instead are FileUpload instances without a filename and stored in request.files. This is also a safeguard, and MEMFILE_MAX is big enough that 'normal' text fields should never exceed it. But with the proposed solution (thread all fields as file uploads once the limit is reached) it is now possible that normal-sized text fields end up in temporary files. That's very unpredictable as it depends on order of fields.

Hmm, maybe just scrap the total hard memory limit and rely on MEMFILE_MAX and the part limit? This is 128 at the moment and hard coded (which also has to change). Or default to a total memory limit of part_limit*memfile_limit? That would be 12.8MB at the moment, which sounds reasonable for modern servers and would never be reached, because the part limit hits first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants