-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Disable CheckConnLiveness
by default.
#1451
Comments
Ignoring timeouts, I'm sure that It's not just intentional disconnections by the server that it deals with either. I believe So, my opinion is that disabling I understand how |
Do you mean you counted few thousand "unexpected read from socket" errors in log?
Do you mean many user don't see "unexpected read from socket" although it happened quite a lot?
As far as I know, MySQL Connector/C doesn't check readiness before sending query. Many users builds robust application without it. I don't know why only Go needs it by default. (I understand it helps GitHub backends.)
It is interesting idea, although it introduce dependency to this module. |
Yes, exactly.
I mean that I would guess that many users also see a lot of "unexpected EOF", but they just ignore it because everything still works as long as new connections are created silently and successfully. I agree that it could mask an underlying network problems, but only for idle connections. If a service loses all connections to the server then you'd also expect that active queries will fail, and maybe new connections will fail, both of which are the more important things that won't be hidden.
I think the major difference is that go-sql-driver/mysql must work with database/sql's connection pooling. As a counterexample, mysql-connector-python has a connection pool and does check
GitHub isn't doing anything special here other than using Go to interact with MySQL at a very large scale. That scale means that we hit weird edge cases more than most others, and those weird edge cases impact more users than most others, but that doesn't mean these things are unique to us. I think it would be better to think of our observations as examples rather than exceptions. Just to be clear, it's also a coincidence that the original author of this code happened to be a GitHub employee at the time. #1392, #1393, and #1394 were raised independently and with no knowledge of that. It was actually a surprise to me. So, the reason I'm trying to defend |
Unlealthy network kills not only idle connections, but also connections waiting results/sending query.
#1392, #1393, #1394 are special because shorter lifetime than wait_timeout is recommended to solution for most users. It is not good solution because GitHub is special.
OK, it makes sense to me. Check readability is lighter than pre-ping. |
I think you've misunderstood our situation:
This is why I say we're not special. Other users will not be following the recommended timeouts, because they haven't needed to (thanks to
To expand on this example further, maybe it's worth considering why mysql-connector-python doesn't require similar client-side timeouts. I think the answer is just because it expects server-side disconnections and handles them gracefully. IMO, that's what go-sql-driver/mysql should also do instead of treating connection loss as something that can be avoided. |
It must be very rare. There are no guarantee the disconnection happened during idle state. Noticing such bad configuration/network when it happend in idle connection is better for users. Network error must be verbose and noisy for users. When I said "special", it didn't mean the error is rare. It meant user should notice about the error rather than hide it automatically. |
I tried to implement it. If it works fine, no need to chose one from better error message and CheckConnLiveness. |
Reading #934, can I ask: what command were you using to detect variables escaping to the heap? What is its output today? Also, does the impact show up on a benchmark that covers real query traffic? I know volume of allocations is important for a Go program, but I am curious to know if doing a query has very low allocation count in general. I work on https://github.com/letsencrypt/boulder/, and in my opinion CheckConnLiveness is a good default for us. One example of why: we use ProxySQL, which sometimes terminates connections for reasons other than timeout. Yes, we can experience errors when that happens in the middle of a query rather than an idle connection, so we need to implement application-level retries anyhow; but I think CheckConnLiveness is greatly reducing our error rate for not very much cost. As one more point of reference, the Go HTTP client fires off a goroutine running a readLoop when a connection is first established. That has a similar effect to the |
I don't remember. Google "golang compiler option escape".
I don't know. Performance is important but maintenance cost & additional external dependency is the main reason I don't want to enable it by default.
Why and when ProxySQL closes connection?
I don't want to do it, because additional goroutine makes mentenance hard. I want to pay my time and mental energy to other features, like compression. |
In general, connection closed from server is dangerous and user should avoid it by using SetConnMaxLifetime (and SetConnMaxIdleTime if needed). CheckConnLiveness reduces the risk but doesn't make it zero.
CheckConnLiveness has performance penalty. (conncheck should be optional #996)
CheckConnLiveness makes difficult to read an unexpected packet from server. ("closing bad idle connection: unexpected read from socket" errors on MySQL 8.0.24 #1392)
If Go supports nonblocking readable test, we can fix #996 and #1392.
Until then, how about disable the option by default in v1.8?
I don't talking about removing it so Githubber can keep using the option.
The text was updated successfully, but these errors were encountered: