Client: are we using curl efficiently? #4967

RichardHaselgrove · 2022-10-05T12:18:09Z

RichardHaselgrove
Oct 5, 2022
Collaborator

The new managers of World Community Grid are struggling to optimize their network connectivity. For me, the major problems are with downloading task datafiles following a successful scheduler contact. The logs sometimes throw up questions worth discussing. Here's the full log of one such download connection.

WCG_download_failure.log

This particular log was captured from BOINC v7.20.2, running under Linux Mint 20.3

Some extracts of interest:

Oct 05 08:53:34 Michelle boinc[1678]: 05-Oct-2022 08:53:34 [World Community Grid] [http] [ID#45505] Received header from server: HTTP/2 503
Oct 05 08:53:34 Michelle boinc[1678]: 05-Oct-2022 08:53:34 [World Community Grid] [http] [ID#45505] Received header from server: cache-control: no-cache
Oct 05 08:53:34 Michelle boinc[1678]: 05-Oct-2022 08:53:34 [World Community Grid] [http] [ID#45505] Received header from server: content-type: text/html
Oct 05 08:53:34 Michelle boinc[1678]: 05-Oct-2022 08:53:34 [World Community Grid] [http] [ID#45505] Received header from server:
Oct 05 08:53:34 Michelle boinc[1678]: 05-Oct-2022 08:53:34 [World Community Grid] [http] [ID#45505] Received header from server: <html><body><h1>503 Service Unavailable</h1>
Oct 05 08:53:34 Michelle boinc[1678]: 05-Oct-2022 08:53:34 [World Community Grid] [http] [ID#45505] Received header from server: No server is available to handle this request.
Oct 05 08:53:34 Michelle boinc[1678]: 05-Oct-2022 08:53:34 [World Community Grid] [http] [ID#45505] Received header from server: </body></html>

I don't think we make any attempt to capture, interpret and report these useful messages. Should we?

Oct 05 08:53:34 Michelle boinc[1678]: 05-Oct-2022 08:53:34 [World Community Grid] [http] [ID#45505] Info:  Connection cache is full, closing the oldest one.
Oct 05 08:53:34 Michelle boinc[1678]: 05-Oct-2022 08:53:34 [World Community Grid] [http] [ID#45505] Info:  Closing connection 22618

This information message can't be found in client source code - we're assuming it's generated by curl. The difference between oldest and newest connection IDs is some 20k - does our cache really get that big? Should it?

My attention has been drawn to https://curl.se/libcurl/c/CURLMOPT_MAXCONNECTS.html - which suggests that the maximum number of multi-use connections can be limited. Should we use that facility?

AenBleidd · 2022-10-05T12:27:44Z

AenBleidd
Oct 5, 2022
Maintainer

I believe we fixed this a little bit differently: #4915
You can try to test this with the client from master to check that currently the situation is better

0 replies

RichardHaselgrove · 2022-10-05T13:40:09Z

RichardHaselgrove
Oct 5, 2022
Collaborator Author

I'll give it a try - there are plenty of opportunities at the moment! I'll report back later.

0 replies

RichardHaselgrove · 2022-10-05T14:18:48Z

RichardHaselgrove
Oct 5, 2022
Collaborator Author

OK, I've downloaded and am running the artifact. Obviously, I had to shutdown and restart the client, so all the IDs reset to zero. By the time I got to a download page with all files backed off (so I could get a clean log for a single file retry), the result was:

Wed 05 Oct 2022 15:08:58 BST | World Community Grid | [http] [ID#33] Info:  Connected to download.worldcommunitygrid.org (199.241.167.118) port 443 (#15)
Wed 05 Oct 2022 15:08:58 BST | World Community Grid | [http] [ID#33] Info:  Using Stream ID: 15 (easy handle 0x55b069733f90)
Wed 05 Oct 2022 15:09:02 BST | World Community Grid | [http] [ID#33] Received header from server: No server is available to handle this request.
Wed 05 Oct 2022 15:09:02 BST | World Community Grid | [http] [ID#33] Info:  Connection cache is full, closing the oldest one.
Wed 05 Oct 2022 15:09:02 BST | World Community Grid | [http] [ID#33] Info:  Closing connection 11
Wed 05 Oct 2022 15:09:02 BST | World Community Grid | [http] [ID#33] Info:  Connection #15 to host download.worldcommunitygrid.org left intact

I see that in #4915 @davidpanderson wrote

Then do
ls -l /proc/p/fd
(p = PID of BOINC client)
to see what files the client has open.

Anyone help guide me to a similar command to see what (or at least how many) file handles it has open?

0 replies

computezrmle · 2022-10-08T08:53:07Z

computezrmle
Oct 8, 2022

@davidpanderson
You may try
lsof -c boinc

0 replies

RichardHaselgrove · 2022-10-08T09:41:28Z

RichardHaselgrove
Oct 8, 2022
Collaborator Author

OK, I tried it - will need some help with interpreting it.

At the time of taking this log, nine download files were stalled and in various stages of backoff.

I also got this warning in terminal:

richard@Michelle:~$ sudo lsof -c boinc >lsof.log
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.
lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc
      Output information may be incomplete.

lsof.log contains 329 lines.

0 replies

computezrmle · 2022-10-08T12:23:04Z

computezrmle
Oct 8, 2022

I would guess the files mentioned by the warnings can be ignored.
They don't look like temporary curl files.
To get more details about them try to run lsof under the mentioned user account (user_id 1000)

The warnings appear because the files are on an fuse mount where only that user has access to (not even root).
You may check this running
mount |grep fuse

A typical output from LHC@home looks like this:

fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
cvmfs2 on /cvmfs/cvmfs-config.cern.ch type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
cvmfs2 on /cvmfs/atlas.cern.ch type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
cvmfs2 on /cvmfs/sft.cern.ch type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
cvmfs2 on /cvmfs/cernvm-prod.cern.ch type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
cvmfs2 on /cvmfs/grid.cern.ch type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
cvmfs2 on /cvmfs/alice.cern.ch type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)

Here, the mountpoints from the cvmfs client are listed which map some of the CERN repositories into the local directory tree

0 replies

AenBleidd · 2022-10-11T09:50:25Z

AenBleidd
Oct 11, 2022
Maintainer

Moving this to Discussion since there is no issue currently here.
If any potential problem would be found - new ticket with exact bug description could be opened (or this Discussion could be converted back to and Issue).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client: are we using curl efficiently? #4967

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Client: are we using curl efficiently? #4967

RichardHaselgrove Oct 5, 2022 Collaborator

Replies: 7 comments

AenBleidd Oct 5, 2022 Maintainer

RichardHaselgrove Oct 5, 2022 Collaborator Author

RichardHaselgrove Oct 5, 2022 Collaborator Author

computezrmle Oct 8, 2022

RichardHaselgrove Oct 8, 2022 Collaborator Author

computezrmle Oct 8, 2022

AenBleidd Oct 11, 2022 Maintainer

RichardHaselgrove
Oct 5, 2022
Collaborator

AenBleidd
Oct 5, 2022
Maintainer

RichardHaselgrove
Oct 5, 2022
Collaborator Author

RichardHaselgrove
Oct 5, 2022
Collaborator Author

computezrmle
Oct 8, 2022

RichardHaselgrove
Oct 8, 2022
Collaborator Author

computezrmle
Oct 8, 2022

AenBleidd
Oct 11, 2022
Maintainer