close
Skip to content

[grid] Align Router-Node read timeout with session pageLoad capability#17211

Merged
VietND96 merged 3 commits intotrunkfrom
handle-session-read-timeout
Mar 12, 2026
Merged

[grid] Align Router-Node read timeout with session pageLoad capability#17211
VietND96 merged 3 commits intotrunkfrom
handle-session-read-timeout

Conversation

@VietND96
Copy link
Copy Markdown
Member

🔗 Related Issues

💥 What does this PR do?

HandleSession computes an effective read timeout per (nodeUri, timeout) pair:

  1. Read timeouts.pageLoad from the session's WebDriver capabilities (the same value the browser will honour for navigation commands).
  2. Fetch the node's own sessionTimeout from /se/grid/node/status — cached once per node URI to avoid repeated HTTP calls.
  3. Use max(pageLoad, nodeTimeout) + 30 s buffer as the effective read timeout. The buffer gives the node time to detect its own timeout and return a proper error response before the Router closes the connection.
  4. Key the HttpClient cache by (nodeUri, effectiveTimeout) via the new NodeClientKey. Sessions that share both the same node and the same timeout continue to share one connection-pooled client; sessions with
    genuinely different timeouts each get their own.

🔧 Implementation Notes

💡 Additional Considerations

🔄 Types of changes

  • Cleanup (formatting, renaming)
  • Bug fix (backwards compatible)
  • New feature (non-breaking change which adds functionality and tests!)
  • Breaking change (fix or feature that would cause existing functionality to change)

Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
@selenium-ci selenium-ci added B-grid Everything grid and server related C-java Java Bindings labels Mar 11, 2026
@qodo-code-review
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

✨ Enhancement 🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Align Router read timeout with session pageLoad capability to prevent premature connection closure
• Implement per-session timeout calculation using max(pageLoad, nodeTimeout) + 30s buffer
• Cache node session timeouts to avoid repeated HTTP calls to /se/grid/node/status
• Key HttpClient cache by (nodeUri, effectiveTimeout) for proper connection pooling

Grey Divider

File Changes

1. java/src/org/openqa/selenium/grid/router/HandleSession.java ✨ Enhancement +108/-18

Implement per-session read timeout alignment with pageLoad capability

• Added NodeClientKey class to cache HttpClients by node URI and effective read timeout
• Introduced nodeTimeoutCache to cache node session timeouts from /se/grid/node/status
• Implemented sessionReadTimeout() method to extract pageLoad timeout from session capabilities
• Modified loadSessionId() to compute effective timeout as max(pageLoad, nodeTimeout) + 30s buffer
• Refactored fetchNodeSessionTimeout() to fetchNodeTimeout() for cleaner separation of concerns

java/src/org/openqa/selenium/grid/router/HandleSession.java


2. java/test/org/openqa/selenium/grid/router/HandleSessionReadTimeoutTest.java 🧪 Tests +265/-0

Add unit and integration tests for read timeout logic

• Added comprehensive unit tests for sessionReadTimeout() method covering various capability
 scenarios
• Added integration tests verifying effective timeout calculation when pageLoad exceeds node timeout
• Added integration tests verifying node timeout acts as floor when it exceeds pageLoad
• Added integration tests for missing pageLoad capability with node timeout as floor

java/test/org/openqa/selenium/grid/router/HandleSessionReadTimeoutTest.java


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented Mar 11, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Timeout failure cached forever🐞 Bug ⛯ Reliability
Description
HandleSession permanently caches the first result of fetchNodeTimeout via computeIfAbsent; if that
first fetch fails, the default ClientConfig read timeout is cached for that node URI with no future
retry. This can make effectiveTimeout too small (when pageLoad is absent/smaller), causing the
Router to time out requests earlier than intended for that node.
Code

java/src/org/openqa/selenium/grid/router/HandleSession.java[R279-285]

+          Duration pageLoadTimeout = sessionReadTimeout(session.getCapabilities());
+          Duration nodeTimeout =
+              nodeTimeoutCache.computeIfAbsent(sessionUri, this::fetchNodeTimeout);
+          Duration base =
+              pageLoadTimeout.compareTo(nodeTimeout) >= 0 ? pageLoadTimeout : nodeTimeout;
+          Duration effectiveTimeout = base.plus(READ_TIMEOUT_BUFFER);
+
Evidence
nodeTimeoutCache.computeIfAbsent caches a value for the node URI and that cached value is used to
compute effectiveTimeout. fetchNodeTimeout swallows all exceptions and returns the default read
timeout, so a transient failure during the first lookup becomes a permanent cached floor for that
node URI.

java/src/org/openqa/selenium/grid/router/HandleSession.java[279-285]
java/src/org/openqa/selenium/grid/router/HandleSession.java[348-360]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`nodeTimeoutCache.computeIfAbsent` caches whatever `fetchNodeTimeout` returns. When `fetchNodeTimeout` fails, it returns `ClientConfig.defaultConfig().readTimeout()` and that fallback gets cached forever for that node URI, preventing future retries and potentially yielding an effective read timeout that is too small.
### Issue Context
- `fetchNodeTimeout` catches all exceptions and returns a fallback value.
- `computeIfAbsent` will store that fallback permanently.
- With the new `(nodeUri, effectiveTimeout)` client keying, this cached node timeout influences all future effective timeout computations for that node.
### Fix Focus Areas
- java/src/org/openqa/selenium/grid/router/HandleSession.java[279-285]
- java/src/org/openqa/selenium/grid/router/HandleSession.java[348-360]
### Suggested direction
- Change the cache population logic so that failures do **not** write to `nodeTimeoutCache` (e.g., do a get-then-fetch-then-put on success, or have `fetchNodeTimeout` throw and handle it without caching).
- Consider adding retries for the status call and/or a TTL-based cache so values can be refreshed.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Node timeout cache unbounded 🐞 Bug ➹ Performance
Description
HandleSession adds nodeTimeoutCache but never evicts entries, so a long-lived Router can accumulate
per-node URI entries indefinitely as nodes are removed/replaced over time. While each entry is
small, this is a permanent growth path that is independent of the existing HttpClient cache cleanup.
Code

java/src/org/openqa/selenium/grid/router/HandleSession.java[R141-156]

+  // Caches the Node's own session-timeout (from /se/grid/node/status) so the HTTP
+  // call is made at most once per Node URI rather than once per session.
+  private final ConcurrentMap<URI, Duration> nodeTimeoutCache;
+  // Keyed by (nodeUri, effectiveReadTimeout) so sessions with the same timeout on
+  // the same Node share a pooled HttpClient while sessions with a longer pageLoad
+  // timeout get a client sized to their actual value.
+  private final ConcurrentMap<NodeClientKey, CacheEntry> httpClients;
 private final ScheduledExecutorService cleanUpHttpClientsCacheService;

 HandleSession(Tracer tracer, HttpClient.Factory httpClientFactory, SessionMap sessions) {
   this.tracer = Require.nonNull("Tracer", tracer);
   this.httpClientFactory = Require.nonNull("HTTP client factory", httpClientFactory);
   this.sessions = Require.nonNull("Sessions", sessions);

+    this.nodeTimeoutCache = new ConcurrentHashMap<>();
   this.httpClients = new ConcurrentHashMap<>();
Evidence
nodeTimeoutCache is created and used, but only httpClients has a scheduled cleanup and close() only
closes httpClients. The codebase includes node removal/restart handling elsewhere, indicating node
URIs can change/churn over time, which can grow this cache if the Router stays up.

java/src/org/openqa/selenium/grid/router/HandleSession.java[141-156]
java/src/org/openqa/selenium/grid/router/HandleSession.java[158-184]
java/src/org/openqa/selenium/grid/router/HandleSession.java[363-374]
java/src/org/openqa/selenium/grid/sessionmap/local/LocalSessionMap.java[80-91]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`nodeTimeoutCache` has no eviction/cleanup. In a long-lived Router with node churn, this cache can only grow.
### Issue Context
- `httpClients` has a scheduled cleanup and is cleared on close.
- `nodeTimeoutCache` is neither cleaned up periodically nor cleared on close.
### Fix Focus Areas
- java/src/org/openqa/selenium/grid/router/HandleSession.java[141-156]
- java/src/org/openqa/selenium/grid/router/HandleSession.java[158-184]
- java/src/org/openqa/selenium/grid/router/HandleSession.java[363-374]
### Suggested direction
- Use a bounded/expiring cache (e.g., Caffeine) for node timeouts, or
- Extend the existing cleanup runnable to remove `nodeTimeoutCache` entries for URIs that no longer have any `httpClients` entries, and/or clear `nodeTimeoutCache` in `close()`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment thread java/src/org/openqa/selenium/grid/router/HandleSession.java
Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
@VietND96 VietND96 merged commit 85bc60c into trunk Mar 12, 2026
52 of 53 checks passed
@VietND96 VietND96 deleted the handle-session-read-timeout branch March 12, 2026 09:10
krishnamohan-kothapalli pushed a commit to krishnamohan-kothapalli/selenium that referenced this pull request Mar 18, 2026
SeleniumHQ#17211)

* [grid] Align Router-Node read timeout with session pageLoad capability
* Fix review comment

---------

Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-grid Everything grid and server related C-java Java Bindings

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants