This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
HttpToS3Operator OOM when downloading large file #46008
Labels
You can continue the conversation there. Go to discussion →
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
main is affected
Apache Airflow version
2.10
Operating System
Linux
Deployment
Amazon (AWS) MWAA
Deployment details
No response
What happened
Whole file is attempted to be loaded to memory.
Task exited with return code -9.
airflow/providers/src/airflow/providers/amazon/aws/transfers/http_to_s3.py
Lines 164 to 175 in bb77ebf
In this code response.content is in-memory representation of file content (as bytes).
What you think should happen instead
Lazy load via
stream=True
,response.raw
andS3Hook.load_fileobj()
How to reproduce
Run
HttpToS3Operator
with file larger than RAM available (~20GB in my setup was enough).Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: