-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak when using channels redis PubSub layer #276
Memory leak when using channels redis PubSub layer #276
Comments
@ahmadpoorgholam Are you able to investigate to see where the leak is coming from? |
we do not have a clue, but the only thing that we could say for sure is that the usage of group_send over time is the main suspect for the problem. and the strange thing is we do have clean-up in disconnect ( like group discard and raise stop consumers) |
Is there any update on this? I have been having a similar issue, but unsure if they a related. |
It needs investigation. If someone can narrow down a reproducible issue that would help. |
For the next time the leak shows up is there anything I should/could do to get some debug info on what is going on? |
Hi all - something is rotten in the current pubsub layer. We maintain an internal fork but I recently sat down to merge upstream, and we're seeing lots of issues around memory consumption and sentinel connections. Diffing our internal vs upstream doesn't yield anything obvious, so I will take another look hopefully in next few weeks...but something has definitely crept in. |
Hey @qeternity — super thanks. What's the merge base for your fork? (Presumably the issue is after that...) |
I set up a basic channels app to explore this a bit my generator async def run_async():
i = 0
msg = {'type': 'internal.message', 'text': 'sbf'}
while i < 1000000:
channel_layer = get_channel_layer()
await channel_layer.group_send(
'default',
msg,
) I have a client that reads these (from the consumer) as fast as it can. Importantly, I let the client run for a bit, then hard control + C the client to disconnect it. I see the server detects this disconnect too
Now, the generator is still pumping out tons of messages, and I notice the memory of my "python manage.py runserver" process is growing a lot. After all the messages are sent, the memory usage remains high indefinitely (does not go back down). Note: I did not notice this growth when using toward the end of sending the messages, I reconnect my client. I have a pdb debugger set up on the consumer to trigger after all of the messages have sent, and I use guppy3 to print a heap hp = guppy.hpy()
okay so 68% of that memory are hp.heap()[0].byrcs
all of that data is in collections.deque objects Hopefully this sheds some light on this issue |
another thing to note -- when I control + C my client, I see this traceback logged in the
To get around this traceback I added some try / except in my consumers.py async def internal_message(self, event):
try:
await self.send(text_data=event["text"])
except Exception:
await self.websocket_disconnect(None)
async def websocket_disconnect(self, message):
await super().websocket_disconnect(message) Now when my client disconnects, I no longer see memory growth, which is good. However, I expected the memory to go down (i.e. all of those buffered messages should be released), but it does not. I don't believe that queue is being cleaned up. If there are messages in the channel Queue, and the consumer disconnects (I control + C my client), I am not hitting channels_redis/channels_redis/pubsub.py Line 191 in bba9319
But if there are no messages in the channel Queue, and the consumer disconnects, then I do hit that I can reproduce this reliably each time. |
@fosterseth given CPython's memory allocator, is this really to be expected? There is a memory leak somewhere, unfortunately we end up cycling k8s containers often enough that it's not much of an issue and haven't investigated further. |
@fosterseth Thanks for investigating this. Your discoveries got me thinking, and now I have some ideas on what is going wrong here. First, some context:The giant comment I wrote here is because django/channels calls So, what's going wrong:@fosterseth I think you've found an execution path where How to fix:I think we should add code (perhaps here) ... something like: if self.channel_layer is not None and hasattr(self.channel_layer, 'clean_channel'):
self.channel_layer.clean_channel(self.channel_name) That is, Then of course we'd change the PubSub impl to clean up in Memory going down...Memory management at the OS-level is whack enough (i.e. calling My use case...Again, like @qeternity's case, in our case we cycle in/out containers often enough (usually because we're deploying new versions of our app) that this leak isn't hurting us in production. That said, it would certainly be great to fix it. |
Can always count on @acu192 for the real insights. Indeed I think fixing memory leaks is the best we can hope for. OS shenanigans aside, CPython rarely returns memory to the OS anyway, especially for small objects. |
@acu192 Would you like to investigate making that change in django/channels? |
Just on the Is it a leak? question. Finding a place to run a |
Yes, I'll put some time on my calendar in the next few weeks to create a PR to django/channels. |
Any progress here, maybe? |
bump |
put up a PR ^ hopefully someone can help test this out
|
hi @acu192 ! any thoughts on those PRs? I don't have permissions to kick off workflow or add reviewers. Thank you |
@fosterseth I'm having a quick look now, just in case the far more capable @acu192 is busy at the moment |
Ha! I just saw that @acu192 is already on it about 20 min ago! |
This article might help some of you: https://www.paradigm.co/blog/anatomy-of-a-python-memory-leak |
Hello! On the production service, we are using channels and channels Redis to deliver heavy updates on the web socket clients and it has worked fine for us for the last six months.
However, since the day that we switched to using the new
PubSubChannelLayer
, we are facing the problem of the constant infinite growth of memory consumption. and the only way around that we have found is restarting the server process on a constant interval to release the memory back.Here you could find some metadata about the system that we are running:
Socket Updates are mostly generated by management commands which are running outside of the context of server processes and they call group_send to provide consumers with new updates.
channels==3.0.4
channels-redis==3.3.0
the server is running under an
Nginx
-uvicorn==0.12.1
stack with the supervisor as the process manager. however, we have tested gunicorn and daphne as well and the memory problem stayed the same.OS: Ubuntu Server 18
The text was updated successfully, but these errors were encountered: