close
Skip to content

fix: cancellation bug for parallel invocations without execution_order and a larger amount of tasks than the poolSize#6215

Merged
jamengual merged 15 commits into
runatlantis:mainfrom
ramonvermeulen:b/atlantis-cancel-parallelism-fix
Apr 24, 2026
Merged

fix: cancellation bug for parallel invocations without execution_order and a larger amount of tasks than the poolSize#6215
jamengual merged 15 commits into
runatlantis:mainfrom
ramonvermeulen:b/atlantis-cancel-parallelism-fix

Conversation

@ramonvermeulen
Copy link
Copy Markdown
Contributor

@ramonvermeulen ramonvermeulen commented Feb 18, 2026

what

This PR fixes a bug where atlantis cancel does not properly cancel parallel invocations when they exceed the configured parallel pool size, as reported in #5813 (comment).

This change implements cancellation checks at the pool size throttling granularity rather than only at the group level. By checking cancellation at a more granular level, tasks waiting for available worker slots can now be properly cancelled, even when they exceed the parallel pool size.

why

Because it improves the atlantis cancel command to also work properly when no execution order is provided while running in parallel.

  • Ensures atlantis cancel works reliably for all parallel invocations, regardless of whether they exceed the pool size or use execution order groups
  • Provides more responsive cancellation by checking at a finer granularity
  • Maintains compatibility with existing execution order groups

tests

All tests use the same terraform configuration, just to mimic a 30 seconds delay via the sleep provider.

main.tf
terraform {
  required_providers {
    time = {
      source  = "hashicorp/time"
      version = "~> 0.13"
    }
    null = {
      source  = "hashicorp/null"
      version = "~> 3.0"
    }
  }
}

provider "time" {}

resource "null_resource" "previous" {}

resource "time_sleep" "wait_30_seconds" {
  depends_on = [null_resource.previous]

  create_duration = "30s"
}

# This resource will create (at least) 30 seconds after null_resource.previous
resource "null_resource" "next" {
  depends_on = [time_sleep.wait_30_seconds]
}

1. With execution order groups and running in parallel (this should still work as supposed)

atlantis.yaml
version: 3
automerge: true
parallel_plan: true
parallel_apply: true
projects:
  - name: atlantis-test-1
    workspace: test1
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 1
  - name: atlantis-test-2
    workspace: test2
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 1
  - name: atlantis-test-3
    workspace: test3
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 1
  - name: atlantis-test-4
    workspace: test4
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 1
  - name: atlantis-test-5
    workspace: test5
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 1
  - name: atlantis-test-6
    workspace: test6
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 2
  - name: atlantis-test-7
    workspace: test7
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 2
  - name: atlantis-test-8
    workspace: test8
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 2
  - name: atlantis-test-9
    workspace: test9
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 2
  - name: atlantis-test-10
    workspace: test10
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 2
  - name: atlantis-test-11
    workspace: test11
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 3
  - name: atlantis-test-12
    workspace: test12
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 3
  - name: atlantis-test-13
    workspace: test13
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 3
  - name: atlantis-test-14
    workspace: test14
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 3
  - name: atlantis-test-15
    workspace: test15
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 3
  - name: atlantis-test-16
    workspace: test16
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 4
  - name: atlantis-test-17
    workspace: test17
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 4
  - name: atlantis-test-18
    workspace: test18
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 4
  - name: atlantis-test-19
    workspace: test19
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 4
  - name: atlantis-test-20
    workspace: test20
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
    execution_order_group: 4
Behavioral screenshots image image image

2. Without execution order groups and running in parallel while exceeding the pool size

atlantis.yaml
version: 3
automerge: true
parallel_plan: true
parallel_apply: true
projects:
  - name: atlantis-test-1
    workspace: test1
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-2
    workspace: test2
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-3
    workspace: test3
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-4
    workspace: test4
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-5
    workspace: test5
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-6
    workspace: test6
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-7
    workspace: test7
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-8
    workspace: test8
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-9
    workspace: test9
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-10
    workspace: test10
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-11
    workspace: test11
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-12
    workspace: test12
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-13
    workspace: test13
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-14
    workspace: test14
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-15
    workspace: test15
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-16
    workspace: test16
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-17
    workspace: test17
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-18
    workspace: test18
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-19
    workspace: test19
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-20
    workspace: test20
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
Behavioral screenshots image image image image

3. Without running in parallel (this should still work as supposed)

atlantis.yaml (same as 2, just without `parellel_plan` and `parallel_apply`)
version: 3
automerge: true
projects:
  - name: atlantis-test-1
    workspace: test1
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-2
    workspace: test2
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-3
    workspace: test3
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-4
    workspace: test4
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-5
    workspace: test5
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-6
    workspace: test6
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-7
    workspace: test7
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-8
    workspace: test8
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-9
    workspace: test9
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-10
    workspace: test10
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-11
    workspace: test11
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-12
    workspace: test12
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-13
    workspace: test13
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-14
    workspace: test14
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-15
    workspace: test15
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-16
    workspace: test16
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-17
    workspace: test17
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-18
    workspace: test18
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-19
    workspace: test19
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
  - name: atlantis-test-20
    workspace: test20
    dir: .
    terraform_version: v1.12.2
    autoplan:
      when_modified: ["*.tf"]
      enabled: true
Behavioral screenshots image image image

I only added f28602d after running the tests to align the error messages. This can also be seen as extra validation (see screenshots) which code paths have been hit in which test scenario's.

references

#5813 (comment)
#187

@ramonvermeulen ramonvermeulen force-pushed the b/atlantis-cancel-parallelism-fix branch 2 times, most recently from 72f2c7f to 73bcd46 Compare February 18, 2026 14:10
@ramonvermeulen ramonvermeulen changed the title fix: cancellation bug for parallel invocations without execution_order and larger amount of tasks than the poolSize [WIP] fix: cancellation bug for parallel invocations without execution_order and a larger amount of tasks than the poolSize [WIP] Feb 18, 2026
@ramonvermeulen ramonvermeulen force-pushed the b/atlantis-cancel-parallelism-fix branch 4 times, most recently from 6a98f81 to f28602d Compare February 24, 2026 16:45
@ramonvermeulen ramonvermeulen marked this pull request as ready for review February 24, 2026 16:48
@dosubot dosubot Bot added bug Something isn't working go Pull requests that update Go code labels Feb 24, 2026
@ramonvermeulen
Copy link
Copy Markdown
Contributor Author

@Wirone @lukemassa @bschaatsbergen @jamengual

I would appreciate a code review when someone has availability.

I've tested this change locally with several scenarios (detailed in the PR description) and confirmed it resolves the issue where atlantis cancel fails to properly terminate queued operations when parallel_plan and/or parallel_apply are enabled in combination with exceeding the parallel pool size.

@ramonvermeulen ramonvermeulen changed the title fix: cancellation bug for parallel invocations without execution_order and a larger amount of tasks than the poolSize [WIP] fix: cancellation bug for parallel invocations without execution_order and a larger amount of tasks than the poolSize Feb 25, 2026
@ramonvermeulen ramonvermeulen force-pushed the b/atlantis-cancel-parallelism-fix branch from f28602d to bda1772 Compare February 25, 2026 08:17
@jamengual
Copy link
Copy Markdown
Contributor

@bschaatsbergen could you take a look at this one?

Copy link
Copy Markdown

@Wirone Wirone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a Go specialist and my approval does not work here anyway, so only a comment from my side 🙂. First of all, thank you very much for picking this! From perspective of a person who sees this codebase for the first time the changes look good. One thing that could be improved is test coverage - there are no new tests added, so basically the issue was reproduced manually, which is somehow fine, but lacks regression guard. I have no idea whether it's possible to test it, though, so let maintainers decide. I am looking forward to this getting released, as it will help us in our workflows 🙂.

@ramonvermeulen
Copy link
Copy Markdown
Contributor Author

ramonvermeulen commented Feb 26, 2026

One thing that could be improved is test coverage - there are no new tests added, so basically the issue was reproduced manually, which is somehow fine, but lacks regression guard. I have no idea whether it's possible to test it, though, so let maintainers decide. I am looking forward to this getting released, as it will help us in our workflows 🙂.

Thanks, and I do agree on this point to be honest. I'm not sure if either runProjectCmdsParallelGroups and/or runProjectCmds are curently covered by unit tests at all, when I do a quick search I do not find any references in terms of tests.

I will invest some time into this to see if I can easily add some test cases to at least cover the newly fixed functionality.

@ramonvermeulen ramonvermeulen force-pushed the b/atlantis-cancel-parallelism-fix branch from bda1772 to 4e3316f Compare March 24, 2026 17:45
Copilot AI review requested due to automatic review settings March 24, 2026 17:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix atlantis cancel behavior for parallel plan/apply executions when the number of project commands exceeds the configured parallel pool size (especially when no execution_order is used), by introducing cancellation checks at the pool-throttling level.

Changes:

  • Extend the parallel project command executor to accept a CancellationTracker + PullRequest and attempt cancellation-aware scheduling.
  • Thread the new executor signature through runners (e.g., version, policy_check).
  • Standardize the cancellation error message to explicitly reference the atlantis cancel command.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
server/events/project_command_pool_executor.go Adds cancellation-aware parallel execution and updates cancellation error messaging.
server/events/version_command_runner.go Updates call to runProjectCmdsParallelGroups to match new signature.
server/events/policy_check_command_runner.go Updates call to runProjectCmdsParallel to match new signature.

Comment thread server/events/project_command_pool_executor.go
Comment thread server/events/project_command_pool_executor.go
@ramonvermeulen ramonvermeulen force-pushed the b/atlantis-cancel-parallelism-fix branch from 5d04a7a to daad9bb Compare March 24, 2026 18:46
@ramonvermeulen
Copy link
Copy Markdown
Contributor Author

Just updated the PR with unit tests, I think it is ready for review.

…ined and more tasks than the pool size

Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
…tine call

Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
…al + to prevent regression for poolSize bug

Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
…() can be blocking

Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
@ramonvermeulen ramonvermeulen force-pushed the b/atlantis-cancel-parallelism-fix branch from daad9bb to e4a4b2c Compare March 25, 2026 06:41
@jamengual
Copy link
Copy Markdown
Contributor

@ramonvermeulen please fix the conflicts. thanks

Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
@ramonvermeulen
Copy link
Copy Markdown
Contributor Author

@ramonvermeulen please fix the conflicts. thanks

Just updated the PR.

Signed-off-by: Ramon Vermeulen <ramonvermeulen98@gmail.com>
@ramonvermeulen ramonvermeulen force-pushed the b/atlantis-cancel-parallelism-fix branch from 359cddb to ffe020f Compare April 22, 2026 22:11
@jamengual
Copy link
Copy Markdown
Contributor

@pseudomorph @lukemassa could one of you take a look at this?

@pseudomorph
Copy link
Copy Markdown
Contributor

I think the overall change makes sense and seems to have good comment/test coverage.

Copy link
Copy Markdown
Contributor

@jamengual jamengual left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ramonvermeulen for the contribution

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Apr 24, 2026
@jamengual jamengual merged commit 4e448f7 into runatlantis:main Apr 24, 2026
39 checks passed
@Wirone
Copy link
Copy Markdown

Wirone commented May 6, 2026

Reporting back: seems the fix is correct, Atlantis now cancels our jobs properly 🥳! However, it would be great to have a config option similar to hide-unchanged-plan-comments, so the cancelled jobs are hidden or at least grouped somehow. In case when Atlantis plans ~170 projects, when you cancel early the feedback spams PR much with all the "Plan error: operation cancelled via atlantis cancel command" 😅. But it's only a detail, most important is that cancellation works!

@ramonvermeulen
Copy link
Copy Markdown
Contributor Author

ramonvermeulen commented May 6, 2026

Reporting back: seems the fix is correct, Atlantis now cancels our jobs properly 🥳! However, it would be great to have a config option similar to hide-unchanged-plan-comments, so the cancelled jobs are hidden or at least grouped somehow. In case when Atlantis plans ~170 projects, when you cancel early the feedback spams PR much with all the "Plan error: operation cancelled via atlantis cancel command" 😅. But it's only a detail, most important is that cancellation works!

Thanks for doing the sanity check! It's sometimes hard to test these things in the "real-world" because I don't have an atlantis instance with 170 "real-world" projects at my exposal, so I try to test things on a best-effort base by trying to replicate the scenario locally.

Might look into the hide-unchanged-plan-comments, maybe it makes sense to (if possible) just hook it into that exact configuration option? In a sense nothing changed with the plan, so it might make sense to then just not comment the plan errors if this option is enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working go Pull requests that update Go code lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants