close
The Wayback Machine - https://web.archive.org/web/20200919182257/https://github.com/github/resque/pull/22
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In worker, retry forever with an exponential back off when Redis interactions time out #22

Open
wants to merge 13 commits into
base: github
from

Conversation

@nathansobo
Copy link

nathansobo commented Nov 7, 2019

Currently, when we time out talking to Redis, we reconnect and retry the operation. For a fail-over scenario where the Redis server has moved to a new host, this behavior works. For scenarios in which the Redis is still available but is overwhelmed with load, repeatedly reconnecting and retrying operations has the potential to make the situation worse.

In this PR, I introduce Worker#with_exponential_backoff and use it in the Worker instead of with_retries.

  • When retrying, exponentially back off by powers of 2, up to a maximum of 60 seconds, with 5 seconds of random jitter.
  • Continue retrying forever until the worker is explicitly shut down. This prevents a scenario where the worker process dies after N attempts only to be restarted by Resqued. This ensures that we continue to retry at a reduced frequency until Redis service health recovers. Restarting the process would cause us to start retrying at a faster rate.

I limit these changes to the worker because backing off and retrying forever in Unicorn processes when enqueuing jobs could cause request timeouts.

I also change the behavior of with_retries slightly so that attempts to reconnect also count as a retry attempt. The existing logic can end up trying to reconnect up to 9 times in certain scenarios.

@dbussink
Copy link
Member

dbussink commented Nov 7, 2019

Sorry, I missed this PR when opening #23 and after @nronas approved it, I already merged it before this change.

Feel free to incorporate some of the further changes here though, #23 was aiming at the most minimal fix I could come up with.

Co-Authored-By: Nathan Witmer <nathan@zerowidth.com>
@nathansobo nathansobo force-pushed the fix-retries branch from ee6c29a to e6cfc05 Nov 7, 2019
@nathansobo nathansobo changed the title Avoid infinite loop in retry logic when exceptions occur talking to Redis In worker, retry forever with an exponential back off when Redis interactions time out Nov 7, 2019
@nathansobo nathansobo marked this pull request as ready for review Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.