fix: cancellation bug for parallel invocations without execution_order and a larger amount of tasks than the poolSize#6215
Conversation
72f2c7f to
73bcd46
Compare
execution_order and larger amount of tasks than the poolSize [WIP]execution_order and a larger amount of tasks than the poolSize [WIP]
6a98f81 to
f28602d
Compare
|
@Wirone @lukemassa @bschaatsbergen @jamengual I would appreciate a code review when someone has availability. I've tested this change locally with several scenarios (detailed in the PR description) and confirmed it resolves the issue where |
execution_order and a larger amount of tasks than the poolSize [WIP]execution_order and a larger amount of tasks than the poolSize
f28602d to
bda1772
Compare
|
@bschaatsbergen could you take a look at this one? |
Wirone
left a comment
There was a problem hiding this comment.
I am not a Go specialist and my approval does not work here anyway, so only a comment from my side 🙂. First of all, thank you very much for picking this! From perspective of a person who sees this codebase for the first time the changes look good. One thing that could be improved is test coverage - there are no new tests added, so basically the issue was reproduced manually, which is somehow fine, but lacks regression guard. I have no idea whether it's possible to test it, though, so let maintainers decide. I am looking forward to this getting released, as it will help us in our workflows 🙂.
Thanks, and I do agree on this point to be honest. I'm not sure if either I will invest some time into this to see if I can easily add some test cases to at least cover the newly fixed functionality. |
bda1772 to
4e3316f
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to fix atlantis cancel behavior for parallel plan/apply executions when the number of project commands exceeds the configured parallel pool size (especially when no execution_order is used), by introducing cancellation checks at the pool-throttling level.
Changes:
- Extend the parallel project command executor to accept a
CancellationTracker+PullRequestand attempt cancellation-aware scheduling. - Thread the new executor signature through runners (e.g., version, policy_check).
- Standardize the cancellation error message to explicitly reference the
atlantis cancelcommand.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| server/events/project_command_pool_executor.go | Adds cancellation-aware parallel execution and updates cancellation error messaging. |
| server/events/version_command_runner.go | Updates call to runProjectCmdsParallelGroups to match new signature. |
| server/events/policy_check_command_runner.go | Updates call to runProjectCmdsParallel to match new signature. |
5d04a7a to
daad9bb
Compare
|
Just updated the PR with unit tests, I think it is ready for review. |
…ined and more tasks than the pool size Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
…tine call Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
…al + to prevent regression for poolSize bug Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
…() can be blocking Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
daad9bb to
e4a4b2c
Compare
Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
|
@ramonvermeulen please fix the conflicts. thanks |
Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
Just updated the PR. |
Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
359cddb to
ffe020f
Compare
|
@pseudomorph @lukemassa could one of you take a look at this? |
|
I think the overall change makes sense and seems to have good comment/test coverage. |
jamengual
left a comment
There was a problem hiding this comment.
thanks @ramonvermeulen for the contribution
|
Reporting back: seems the fix is correct, Atlantis now cancels our jobs properly 🥳! However, it would be great to have a config option similar to |
Thanks for doing the sanity check! It's sometimes hard to test these things in the "real-world" because I don't have an atlantis instance with 170 "real-world" projects at my exposal, so I try to test things on a best-effort base by trying to replicate the scenario locally. Might look into the |
what
This PR fixes a bug where
atlantis canceldoes not properly cancel parallel invocations when they exceed the configured parallel pool size, as reported in #5813 (comment).This change implements cancellation checks at the pool size throttling granularity rather than only at the group level. By checking cancellation at a more granular level, tasks waiting for available worker slots can now be properly cancelled, even when they exceed the parallel pool size.
why
Because it improves the atlantis cancel command to also work properly when no execution order is provided while running in parallel.
atlantis cancelworks reliably for all parallel invocations, regardless of whether they exceed the pool size or use execution order groupstests
All tests use the same terraform configuration, just to mimic a 30 seconds delay via the sleep provider.
main.tf
1. With execution order groups and running in parallel (this should still work as supposed)
atlantis.yaml
Behavioral screenshots
2. Without execution order groups and running in parallel while exceeding the pool size
atlantis.yaml
Behavioral screenshots
3. Without running in parallel (this should still work as supposed)
atlantis.yaml (same as 2, just without `parellel_plan` and `parallel_apply`)
Behavioral screenshots
I only added f28602d after running the tests to align the error messages. This can also be seen as extra validation (see screenshots) which code paths have been hit in which test scenario's.
references
#5813 (comment)
#187