-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix execution of failpoint should not block deactivation #65
Fix execution of failpoint should not block deactivation #65
Conversation
acc1691
to
361aac9
Compare
I made a small demo program to demonstrate the effect after applying this change. |
7e76f7f
to
f903576
Compare
cc: @ahrtr @serathius |
Please read Lines 40 to 45 in 93c579a
Overall, I agree that Lines 284 to 291 in 93c579a
|
f903576
to
e51e11e
Compare
Hey @ahrtr, Thanks for reviewing and pointing out this issue. Sorry that it took a while to fix this. I didn't know if there exists a cleaner way to implement this. Currently, I introduced a special mutex called /cc @ahrtr |
1c33e68
to
c591bde
Compare
/cc @serathius |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me.
Have you manually verified that it can resolve the issue #64?
@ArkaSaha30, please let me know if you have any comments. |
Agree about the integration test, it should be enough to test that we can setup a failpoint with 1 second sleep and deactivate it before 1 second passes. |
28bde52
to
5f5a127
Compare
That's similar to what I have in the demo program! :) I will implement this in the integration test! Thanks @serathius |
There are 2 main flows of the gofail library: namely enable/disable and execution (`Acquire`) of the failpoints. Currently, a mutex is protecting both flows, thus only one action can make progress at a time. This PR proposes a fine-grained mutex, as each failpoint is protected under a dedicated `RWMutex`. The existing `failpointsMu` will only be protecting the main shared data structures, such as `failpoints` map. Notice that in our current implementation, the execution of the same failpoint is still sequential (there is a lock within `eval` on the term being executed) Reference: - etcd-io#64 Signed-off-by: Chun-Hung Tseng <[email protected]>
Signed-off-by: Chun-Hung Tseng <[email protected]>
Ensures that panic failpoints and serving of the http requests won't be able to be executed at the same time. Signed-off-by: Chun-Hung Tseng <[email protected]>
Signed-off-by: Chun-Hung Tseng <[email protected]>
Co-authored-by: Benjamin Wang <[email protected]> Co-authored-by: Marek Siarkowicz <[email protected]> Signed-off-by: Chun-Hung Tseng <[email protected]>
5f5a127
to
499c363
Compare
PLease let me know if you want to do it in this PR or in a separate PR. |
@ahrtr let me try to add the integration test in this PR, so we can have the code and the test merged together. What do you think :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have started another PR to temporarily park my integration test implementation (#69). @ahrtr can you review and see if we need to have a separate discussion regarding the folder/code structure of the integration test, or I can just use that commit here and we merge everything in one go! :) Thanks! |
I was only expecting your to do manual test, because it needs big effort to setup the e2e or integration test. But big thanks for offering the e2e/integration test. My proposal:
After all of above is done, we will release v0.2.0. |
Thanks for the action items @ahrtr, I will push out changes ASAP. I would add one more step in between. I am also using etcd-io/etcd#18018 to double-check if the gofail changes are working as intended! |
Signed-off-by: Chun-Hung Tseng <[email protected]> Co-authored-by: Marek Siarkowicz <[email protected]>
837e826
to
60b65e2
Compare
Hi @ahrtr and @serathius, I have renamed the mutex as suggested, and added a comment to the new mutex :) I also created etcd-io/etcd#18018, which
|
Signed-off-by: Chun-Hung Tseng <[email protected]> Co-authored-by: Marek Siarkowicz <[email protected]>
96066ba
to
13d483e
Compare
There are 2 main flows of the gofail library: namely, enable/disable and execution (
Acquire
) of the failpoints.Currently, a mutex is protecting both flows, thus, only one action can make progress at a time.
This PR proposes a fine-grained mutex, as each failpoint is protected under a dedicated
RWMutex
. The existingfailpointsMu
will only be protecting the main shared data structures, such as thefailpoints
map.Notice that in our current implementation, the execution of the same failpoint is still sequential (there is a lock within
eval
on the term being executed)Reference: