[20.10] Backport fixes for kill/stop handling#42956
Conversation
1. fixes moby#41587 2. removes potential infinite Wait and goroutine leak at end of kill function fixes moby#41587 Signed-off-by: Cam <gh@sparr.email> (cherry picked from commit e57a365) Signed-off-by: Brian Goff <cpuguy83@gmail.com>
this refactors the Stop command to fix a few issues and behaviors that dont seem completely correct: 1. first it fixes a situation where stop could hang forever (moby#41579) 2. fixes a behavior where if sending the stop signal failed, then the code directly sends a -9 signal. If that fails, it returns without waiting for the process to exit or going through the full docker kill codepath. 3. fixes a behavior where if sending the stop signal failed, then the code sends a -9 signal. If that succeeds, then we still go through the same stop waiting process, and may even go through the docker kill path again, even though we've already sent a -9. 4. fixes a behavior where the code would wait the full 30 seconds after sending a stop signal, even if we already know the stop signal failed. fixes moby#41579 Signed-off-by: Cam <gh@sparr.email> (cherry picked from commit 8e362b7) Signed-off-by: Brian Goff <cpuguy83@gmail.com>
log statement should reflect how long it actually waited, not how long it theoretically could wait based on the 'seconds' integer passed in. Signed-off-by: Cam <gh@sparr.email> (cherry picked from commit d15ce13) Signed-off-by: Brian Goff <cpuguy83@gmail.com>
samuelkarp
left a comment
There was a problem hiding this comment.
LGTM
FWIW, we're shipping backports of #41586 and #41588 in the Amazon Linux package of Docker already, since those PRs were directly motivated by issues our customers ran into.
Somewhat related: we (Amazon Linux) ship a few more backports as well. I'd be happy to open upstream PRs for these if we (Moby maintainers) are comfortable shipping them to everyone. (If anyone is curious, you can grab the SRPM with yumdownloader --source docker to see what we ship for Amazon Linux.)
tianon
left a comment
There was a problem hiding this comment.
LGTM -- IMO any bugfixes that are affecting folks and aren't too intrusive to backport are totally reasonable 😅
So, for 20.10.10, I at least would like to have #42836. If we consider this PR to be "too risky", we could consider doing a 20.10.11 shortly after 20.10.10 (which would give a "fallback" in case 20.10.11 explodes).
In general, I would (personally) say; yes, please do! Slightly more detailed response: we should include important bugfixes in patch releases; what's "important" is of course a very grey area, but would partially be based on reports of real life issues that users ran into. AWS, Azure, Docker may all get such reports individually, so if backports are done in this (upstream) repository, at least it helps each of us having to discover those issues (and maintaining our own set of patches). I don't think there's any benefit in "but it worked on my Of course, there may be cases where a particular bug hits AWS, or Azure, or Docker differently, and fixing the bug may not outweigh the risk involved, but that's where have code review for (and raising a 👎 "this is too risky!!" should be fine for these cases). |
I'm comfortable with it either way; Amazon Linux has been shipping these backports since mid-August and I'm not aware of any major issues with it so far.
Sounds good, I'll start getting those opened soon. We have a couple larger backports (including dependency vendoring updates) so we can think about those independently. |
Yes, those would likely fall in the "it depends.." category. I would also like to move forward with starting a 21.xx release soon; perhaps that would give a separate vehicle for shipping such patches. |
bed624f - #41588 Fix infinite wait in kill behavior (fixes #41587)
a4bcd4c - #41586 Fix stop hanging forever, including infinite wait (fixes #41579)
3285c27 - #42488 Fix log statement
Opening this up for discussion.
It is kind of a big change that could get us into trouble.
I did have someone from MSFT request that this be fixed since we ran into it in some case.
/cc @sparrc