fix: use leader election in MergeAgain to unblock parallel plans when PR is behind base#6470
Conversation
…hind base Signed-off-by: Josep Medialdea <josep@medialdea.io>
…ncurrent test Signed-off-by: Josep Medialdea <josep@medialdea.io>
0cdbbd8 to
eef48e1
Compare
Signed-off-by: Josep Medialdea <josep@medialdea.io>
|
Is this perhaps an issue with |
@pseudomorph
I can rework this so there’s just one in-flight Before I do that:
|
|
I think we can just drop |
|
Agree, that's cleaner. Will push an update |
Signed-off-by: Josep Medialdea <josep@medialdea.io>
Summary
checkout-strategy: mergeand the PR is behind the base branchClone()); this PR fixes the "PR diverged" path inMergeAgain()that fix: use read lock for clone reuse check to unblock parallel plans #6376 did not coverchan struct{}instead of queuing on the write lock, then proceed torunStepsin parallelProblem
PR #6376 introduced a read-lock fast path in
Clone()so parallel goroutines skip the write lock when the repo is already at the correct commit. That fixed the common case.However, when the PR is behind its base branch, all goroutines pass through
MergeAgain()and independently try to acquire the write lock after detecting divergence. BecauserunSteps()holds a shared read lock for the entire plan duration (potentially minutes), every write-lock attempt serializes behind the preceding goroutine's plan:G0: recheckDiverged [read lock] → write lock → merge → runSteps [READ, 5 min]
G1: recheckDiverged [read lock] → write lock → BLOCKED by G0's read lock
G2: recheckDiverged [read lock] → write lock → BLOCKED waiting for G1…
Every goroutine waits for the preceding goroutine's entire plan to complete before it can even discover whether a merge is still needed.
Fix
Elect exactly one goroutine as the merge leader using
sync.Map.LoadOrStore. The leader acquires the write lock and merges as before. All other goroutines receive the result through achan struct{}that is completely independent of theRWMutex— no write-lock contention at all:G0: recheckDiverged [read lock] → leader → write lock → merge → runSteps [read]
G1: recheckDiverged [read lock] → follower → <-done (channel) → runSteps [read]
G2: recheckDiverged [read lock] → follower → <-done (channel) → runSteps [read]
All goroutines proceed to
runStepsin parallel as soon as the single merge finishes.Defer ordering: the channel is closed and the map entry deleted while the write lock is still held (LIFO defer order), so no new goroutine can observe and join a completed-but-not-yet-deleted entry after the lock is released.
Testing
TestMergeAgain_ConcurrentDiverged: creates a merge-checkout workspace, advances the base branch to force divergence, launches 5 concurrentMergeAgaingoroutines, and asserts no errors and that the workspace on disk contains the base-branch updateRelated
recheckDivergedsafe to run under the read lock, a prerequisite for this fix)Notes