feat: add long-lived Claude CLI process per session (CLAUDE_STREAM_MODE)#685
feat: add long-lived Claude CLI process per session (CLAUDE_STREAM_MODE)#685dione wants to merge 3 commits intositeboon:mainfrom
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new Claude CLI streaming mode with a long-lived per-session subprocess, expands SDK exports, routes server handlers to use stream or SDK based on a feature flag, and refines client-side realtime-vs-server message reconciliation. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Server as ServerIndex
participant Stream as ClaudeStream
participant CLI as ClaudeCLI
Client->>Server: claude-command (sessionId, prompt)
rect rgba(100,150,200,0.5)
Server->>Stream: queryClaudeStream(command, claud eOptions, ws)
activate Stream
alt new session / pending key
Stream->>CLI: spawn claude (--print --input-format stream-json --output-format stream-json [--resume])
CLI-->>Stream: stdout JSONL (init/session_id/events)
Stream->>Stream: parse, rekey session, emit session_created
else existing session
Stream->>CLI: write prompt JSONL to stdin (one-at-a-time)
end
loop stream events
CLI-->>Stream: JSONL event
Stream->>Stream: unwrap & normalize (stream_event, rate_limit, system, result, error)
Stream-->>Server: forward normalized websocket messages (status/stream/complete/error)
Server-->>Client: forward websocket messages
end
CLI-->>Stream: result
Stream->>Stream: emit complete (duration/cost), cleanup temp files, dequeue next prompt
Stream->>Stream: arm idle timeout (kill on idle)
deactivate Stream
end
alt abort
Client->>Server: abort-session
Server->>Stream: abortClaudeStreamSession(sessionId)
Stream->>CLI: kill subprocess / drain queue
end
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
server/claude-stream.js (1)
35-35:CLAUDE_STREAM_IDLE_MS=0cannot disable the idle timer.
parseInt('0', 10) || 30 * 60 * 1000evaluates to the default. If "never time out" isn't intended, this is fine; otherwise gate the coalesce onNumber.isFinite.-const IDLE_TIMEOUT_MS = parseInt(process.env.CLAUDE_STREAM_IDLE_MS, 10) || 30 * 60 * 1000; +const parsed = parseInt(process.env.CLAUDE_STREAM_IDLE_MS, 10); +const IDLE_TIMEOUT_MS = Number.isFinite(parsed) && parsed >= 0 ? parsed : 30 * 60 * 1000;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/claude-stream.js` at line 35, The current IDLE_TIMEOUT_MS initialization uses parseInt(...) || 30*60*1000 so a value of "0" falls back to the default; change the logic in server/claude-stream.js to parse the env var into a number and explicitly check with Number.isFinite (or an explicit null/undefined check) before using it—e.g., compute a numeric parsed value from process.env.CLAUDE_STREAM_IDLE_MS and set IDLE_TIMEOUT_MS = Number.isFinite(parsed) ? parsed : 30*60*1000 to allow 0 to disable the timer.server/index.js (1)
1521-1535: Union of SDK + stream sessions assumes ID format parity.
getActiveClaudeStreamSessions()filters out'pending:'keys (perserver/claude-stream.jsline 518) and otherwise returns whatever keys are in the map — including the real Claude session IDs after rekeying. SincegetActiveClaudeSDKSessions()also returns real session IDs, theSet-based de-duplication here works correctly only because both layers standardize on the CLI-assignedsession_id. If either ever emits provider-internal IDs, the union will report phantom sessions. Leaving a note for future maintainers would help.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/index.js` around lines 1521 - 1535, The union of SDK and stream Claude session lists (in the get-active-sessions branch using CLAUDE_STREAM_MODE, getActiveClaudeSDKSessions and getActiveClaudeStreamSessions) assumes both sources use the same CLI-assigned session_id format and can produce incorrect/phantom sessions if one returns provider-internal IDs; add a clear inline developer comment immediately above this merging logic explaining the ID-parity assumption, the risk (provider-internal IDs like those filtered as 'pending:'), and recommend normalizing session IDs or filtering provider-internal IDs (e.g., via the same rekeying/strip logic used in server/claude-stream.js) if sources diverge in the future so maintainers know what to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/claude-stream.js`:
- Around line 382-418: writePromptNow currently sets session.inFlight and
returns false on synchronous stdin.write failure, but drainQueue ignores that
return and silently drops a prompt; fix by making writePromptNow only set
session.inFlight after a successful write and have drainQueue capture
writePromptNow's boolean result: if false, push the prompt back onto the front
of session.queue and surface the failure (call session.emit('error', err) or
session.onError(...) if present, otherwise console.error) so the UI/session can
react; also ensure submitPrompt behavior remains correct when writePromptNow
fails (i.e., prompt stays queued and inFlight remains false).
- Around line 491-511: In abortClaudeStreamSession, capture the setTimeout
return value (e.g., const timeoutHandle = setTimeout(...)) and attach a one-time
'exit' listener to the child process (session.process.once('exit', () =>
clearTimeout(timeoutHandle))) so the fallback timer is cleared when the process
exits gracefully; also store or reference timeoutHandle on the session if needed
(session.timeoutHandle) and ensure any existing timeout is cleared before
setting a new one, and continue to call killSession(timeoutHandle) only if the
process is still running.
- Around line 423-489: The session currently stores tempImagePaths and tempDir
on session (see session.tempImagePaths, session.tempDir) which are mutated when
a queued prompt with images is submitted (queryClaudeStream) causing deletion of
images belonging to other queued prompts when handleEvent cleans session-level
temps; change ownership so each queued prompt entry (the objects pushed into
session.queue by submitPrompt) carries its own tempImagePaths and tempDir, have
writePromptNow set session.currentPromptTemps = entry (or otherwise mark which
queue entry is in-flight), update handleEvent's result branch to only delete the
temps for the in-flight prompt (session.currentPromptTemps or entry) instead of
session.tempImagePaths/ tempDir, and remove the per-prompt temps when that
prompt completes; update any code that currently reads
session.tempImagePaths/tempDir (e.g., attachStdoutHandler/ drainQueue) to use
the per-entry values or session.currentPromptTemps.
In `@src/stores/useSessionStore.ts`:
- Around line 142-158: The reconciliation currently in
reconcileRealtimeAgainstServer filters out realtimeMessages with local
placeholders solely by comparing timestamps against serverLatestTs, which can
prematurely drop queued prompts; change the logic so that for
isClientPlaceholder entries you only remove them when you can match a canonical
server message instead of relying on timestamp alone — implement a matcher that
compares placeholder message fields (e.g., role, kind, content/text and
optionally a small time window) against slot.serverMessages and only filter when
a corresponding server message is found; keep the existing serverIds check for
exact id matches and only apply the new content/role-based match for ids that
start with 'local_', 'text_' or '__streaming_' (use the existing
isClientPlaceholder helper and slot.serverMessages/slot.realtimeMessages
references).
---
Nitpick comments:
In `@server/claude-stream.js`:
- Line 35: The current IDLE_TIMEOUT_MS initialization uses parseInt(...) ||
30*60*1000 so a value of "0" falls back to the default; change the logic in
server/claude-stream.js to parse the env var into a number and explicitly check
with Number.isFinite (or an explicit null/undefined check) before using it—e.g.,
compute a numeric parsed value from process.env.CLAUDE_STREAM_IDLE_MS and set
IDLE_TIMEOUT_MS = Number.isFinite(parsed) ? parsed : 30*60*1000 to allow 0 to
disable the timer.
In `@server/index.js`:
- Around line 1521-1535: The union of SDK and stream Claude session lists (in
the get-active-sessions branch using CLAUDE_STREAM_MODE,
getActiveClaudeSDKSessions and getActiveClaudeStreamSessions) assumes both
sources use the same CLI-assigned session_id format and can produce
incorrect/phantom sessions if one returns provider-internal IDs; add a clear
inline developer comment immediately above this merging logic explaining the
ID-parity assumption, the risk (provider-internal IDs like those filtered as
'pending:'), and recommend normalizing session IDs or filtering
provider-internal IDs (e.g., via the same rekeying/strip logic used in
server/claude-stream.js) if sources diverge in the future so maintainers know
what to change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9339c8b8-1700-49e4-bbc2-2367d087fcb0
📒 Files selected for processing (4)
server/claude-sdk.jsserver/claude-stream.jsserver/index.jssrc/stores/useSessionStore.ts
- Per-prompt temp image ownership. Previously `session.tempImagePaths`/
`session.tempDir` were shared across all prompts on a session, so a queued
prompt's images could be deleted by the preceding prompt's `result`
cleanup before the CLI had a chance to read them. Temps now live on the
queue entry and on `session.currentPromptTemps` (the in-flight prompt),
and cleanup only touches the current prompt's files. `cleanupSession`
additionally wipes any remaining queued entries' temps on process exit.
- `drainQueue` now reacts to `writePromptNow` failure: it unshifts the
prompt back onto the queue and surfaces an error normalized message, so
a sync stdin write failure no longer silently drops the user's prompt.
`writePromptNow` no longer sets `inFlight` on failure for the same reason.
- `abortClaudeStreamSession` captures the SIGTERM fallback timer handle and
clears it via `child.once('exit', …)` so a graceful SIGINT exit doesn't
leave a 2s timer keeping the event loop alive.
- `reconcileRealtimeAgainstServer` drops client-generated placeholders by
role+kind+content match against server messages instead of a timestamp
comparison. The timestamp heuristic could hide a queued `local_*` prompt
when an earlier server message advanced `serverLatestTs` past it before
the new prompt's canonical arrived.
- Added a developer comment above the `get-active-sessions` SDK+stream
union explaining the ID-parity assumption so future maintainers know
what to normalize if either source changes shape.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/index.js`:
- Around line 1459-1460: The stream-session membership check treats presence in
activeStreamSessions as liveness; update isClaudeStreamSessionActive to validate
the actual child process health (check process.killed and exitCode) and remove
any stale entries from activeStreamSessions when detected, then use that same
health-checked logic in getActiveClaudeStreamSessions so it returns only live
sessions; ensure abortClaudeStreamSession and callers that currently rely on
isClaudeStreamSessionActive will fall back to SDK when the session is dead and
that stale entries are pruned to prevent incorrect isProcessing/status
reporting.
- Around line 1459-1460: Add per-session ownership checks before allowing stream
session actions: ensure the socket's writer.userId matches the owner stored in
the session metadata (or other session owner field) before executing
isClaudeStreamSessionActive, abortClaudeStreamSession, reconnect/rewire writer,
or list-active-session logic. Concretely, in the CLAUDE_STREAM_MODE branches
around isClaudeStreamSessionActive/abortClaudeStreamSession and the equivalent
reconnect/rebind and list handlers, fetch session metadata for data.sessionId,
verify session.ownerUserId (or equivalent) === writer.userId, and reject the
action (dont call abort/rebind/list) when the check fails; surface an
authorization error back to the socket. Ensure the same ownership guard is
applied to the other affected handlers mentioned (rebind/reconnect and discovery
paths).
- Around line 1417-1421: The handler currently passes data.options that contain
projectPath to queryClaudeStream which reads cwd from options, causing Claude to
use the server process cwd when CLAUDE_STREAM_MODE is true; before calling
queryClaudeStream (and ideally both branches), map the selected projectPath into
the options.cwd field (e.g., set data.options.cwd = data.projectPath) so
queryClaudeStream sees the correct working directory (reference symbols:
queryClaudeStream, queryClaudeSDK, data.options, projectPath,
CLAUDE_STREAM_MODE).
In `@src/stores/useSessionStore.ts`:
- Around line 149-163: The current placeholder reconciliation (hasCanonicalMatch
+ slot.realtimeMessages.filter) can wrongly drop consecutive identical prompts;
change it to a consume-once matcher: create a mutable Set (e.g.,
consumedServerIds) and, inside the slot.realtimeMessages.filter, when checking
matches against slot.serverMessages attempt to find a server message whose id is
not in consumedServerIds and whose kind, role and content match (use same checks
as hasCanonicalMatch), and if found add that server message's id to
consumedServerIds and treat the placeholder as matched/removed; keep the
existing serverIds and isClientPlaceholder checks but replace the pure
any()/some() content check with this first-match-and-consume behavior so each
server message only reconciles one realtime placeholder.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ff04ca8f-03bd-478e-b5d3-8b18fe050a03
📒 Files selected for processing (3)
server/claude-stream.jsserver/index.jssrc/stores/useSessionStore.ts
✅ Files skipped from review due to trivial changes (1)
- server/claude-stream.js
- `queryClaudeStream` reads `cwd` from options but the dispatcher was
passing `data.options` unchanged while clients populate `projectPath`,
so a stream-mode Claude process could start in the server's cwd instead
of the user's project directory — especially risky with
`--dangerously-skip-permissions`. The dispatcher now maps
`projectPath` to `cwd` as a fallback for both the stream and SDK paths.
- `isClaudeStreamSessionActive` and `getActiveClaudeStreamSessions` no
longer treat map membership as process liveness. A new `getLiveSession`
helper verifies `process.killed` / `exitCode`, prunes stale entries via
`cleanupSession`, and is used consistently across active-check,
active-list, abort, and reconnect paths. A stale session entry can no
longer misroute aborts away from SDK fallback or surface as
`isProcessing: true`.
- Added per-session ownership checks to `abortClaudeStreamSession`,
`isClaudeStreamSessionActive`, `getActiveClaudeStreamSessions`, and
`reconnectStreamSessionWriter`. The dispatcher in `index.js` now passes
`writer.userId` to these paths so one authenticated client can't abort
or rebind another user's long-lived Claude process just by knowing its
sessionId. Ownership check is a no-op when no userId is supplied
(internal callers / single-user setups).
- `reconcileRealtimeAgainstServer` switches to a consume-once matcher.
Each server message can absorb at most one client placeholder, so two
consecutive identical prompts ("ok", "ok") no longer both collapse onto
the first prompt's canonical server entry when a refresh lands between
them.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
server/claude-stream.js (1)
424-429:⚠️ Potential issue | 🟡 MinorHandle immediate stdin write failures too.
drainQueuenow handleswritePromptNow()returningfalse, but the first non-queued prompt still ignores that result and can fail silently with temp files left behind.🐛 Proposed fix
if (session.inFlight) { session.queue.push(entry); return; } - writePromptNow(session, entry); + if (!writePromptNow(session, entry)) { + if (Array.isArray(entry.tempImagePaths) && entry.tempImagePaths.length > 0) { + cleanupTempFiles(entry.tempImagePaths, entry.tempDir).catch(() => {}); + } + session.writer?.send(createNormalizedMessage({ + kind: 'error', + content: 'Failed to write prompt to Claude process; please retry the message', + sessionId: session.sessionId, + provider: 'claude', + })); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/claude-stream.js` around lines 424 - 429, submitPrompt currently ignores the boolean return of writePromptNow and can silently fail when an immediate stdin write fails; modify submitPrompt to capture the return value of writePromptNow(session, entry) and, if it returns false, push entry onto session.queue (same behavior as when session.inFlight is true) and return, so the entry will be retried by drainQueue and temporary files won't be left behind; reference submitPrompt, writePromptNow, session.queue, and drainQueue when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/claude-stream.js`:
- Around line 220-231: In the event.type === 'error' block, ensure any temporary
image files for the failing prompt are cleaned before starting the next queued
prompt: check session.currentPromptTemps (and
session.currentPromptTemps?.length), unlink/remove each temp path safely
(catch/log errors) and then clear the array/set the property to an empty array
before calling drainQueue(session) or armIdleTimer(session); reference
session.currentPromptTemps, drainQueue, armIdleTimer and the CLI error branch so
the cleanup happens synchronously/awaited (or handled via Promise.all) prior to
marking session.inFlight = false and draining the queue.
- Around line 503-506: Before rebinding session.writer and enqueuing any prompt
in the existing-session branch, verify ownership using sessionBelongsTo(session,
wsClientIdentity) (or the equivalent ownership check used elsewhere); if the
check fails, reject the reuse and do not modify session.writer or enqueue
prompts. Also ensure any ephemeral resources created for the incoming connection
(e.g., image temps referenced by the session/prompt) are cleaned up when
rejecting reuse; retain disarmIdleTimer(session) only after a successful
ownership validation and writer swap. Update the logic around session.writer,
disarmIdleTimer(session), and the prompt enqueue path to enforce this ownership
guard and to run temp cleanup on rejection.
- Around line 529-530: The isSessionProcessAlive check currently uses
session.process.killed and exitCode to determine liveness; replace that logic to
check both exitCode === null and signalCode === null (i.e., return !!session &&
session.process.exitCode === null && session.process.signalCode === null) and
update any places that short-circuit on process.killed (notably the SIGTERM
fallback logic around the timeout handling) to use the same exitCode/signalCode
checks so a process that has been signaled but not yet exited is still
considered alive and the fallback kill timeout will run.
- Around line 295-298: Replace the process 'exit' handler with a 'close' handler
on session.process so stdout/stderr are fully drained before cleanup: change
session.process.on('exit', ...) to session.process.on('close', ...) and keep the
existing logging and cleanupSession(session, { sendComplete: session.inFlight })
call (preserve the use of session.sessionId for logging and session.inFlight for
sendComplete) to ensure final JSONL lines are delivered to the parser before
cleanup.
---
Duplicate comments:
In `@server/claude-stream.js`:
- Around line 424-429: submitPrompt currently ignores the boolean return of
writePromptNow and can silently fail when an immediate stdin write fails; modify
submitPrompt to capture the return value of writePromptNow(session, entry) and,
if it returns false, push entry onto session.queue (same behavior as when
session.inFlight is true) and return, so the entry will be retried by drainQueue
and temporary files won't be left behind; reference submitPrompt,
writePromptNow, session.queue, and drainQueue when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: df6ef324-b1fc-4fab-82c4-8b80229f2df2
📒 Files selected for processing (3)
server/claude-stream.jsserver/index.jssrc/stores/useSessionStore.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/stores/useSessionStore.ts
4894549 to
105525a
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (4)
server/claude-stream.js (4)
295-298:⚠️ Potential issue | 🟠 MajorUse
'close'instead of'exit'to avoid losing the final stdout lines.
'exit'fires before stdio is drained, so the terminalresultJSONL line can sit in the stream queue and never reachhandleEvent. Cleanup then runs withsession.inFlight === trueand the client receives a syntheticcomplete { aborted: true }for what was actually a clean finish. Attach to'close'instead.🐛 Proposed fix
- session.process.on('exit', (code, signal) => { + session.process.on('close', (code, signal) => { console.log(`[claude-stream] process pid=${session.process.pid} exited code=${code} signal=${signal} session=${session.sessionId || 'NEW'}`); cleanupSession(session, { sendComplete: session.inFlight }); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/claude-stream.js` around lines 295 - 298, Replace the process 'exit' listener with a 'close' listener to ensure stdio is fully drained before cleanup: change the event attached on session.process from 'exit' to 'close' in the handler that logs process termination and calls cleanupSession(session, { sendComplete: session.inFlight }); keep the same log message (including session.process.pid, code, signal, session.sessionId) and the same call to cleanupSession so the final stdout JSONL lines are handled by handleEvent before cleanup runs.
545-547:⚠️ Potential issue | 🟠 MajorLiveness check still relies on
process.killed.
ChildProcess.killedflips totrueas soon askill()is called, before the process actually exits. Consequences:
- Line 546: a just-signaled but still-running process is treated as dead, so
getLiveSessioncallscleanupSessionon a live child and future lookups miss it.- Line 592: the SIGTERM fallback timer in
abortClaudeStreamSessionno-ops becausekilledis alreadytrueafter the SIGINT on line 588, so a hung CLI is never escalated tokillSession.- Line 467: same predicate in
queryClaudeStreamprematurely evicts a resume target.Use
exitCode === null && signalCode === nullinstead.🔧 Proposed fix
function isSessionProcessAlive(session) { - return !!session && !session.process.killed && session.process.exitCode === null; + return !!session && session.process.exitCode === null && session.process.signalCode === null; }- const fallbackTimer = setTimeout(() => { - if (!session.process.killed && session.process.exitCode === null) { + const fallbackTimer = setTimeout(() => { + if (session.process.exitCode === null && session.process.signalCode === null) { killSession(session); } }, 2000);- if (session && (session.process.killed || session.process.exitCode !== null)) { + if (session && !isSessionProcessAlive(session)) { activeStreamSessions.delete(sessionId); session = null; }Node.js ChildProcess killed vs exitCode vs signalCode after kill()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/claude-stream.js` around lines 545 - 547, The liveness check in isSessionProcessAlive (used by getLiveSession, queryClaudeStream, abortClaudeStreamSession, cleanupSession, etc.) incorrectly relies on process.killed; replace that predicate with a check that both session.process.exitCode === null and session.process.signalCode === null to detect a still-running child (i.e., return !!session && session.process.exitCode === null && session.process.signalCode === null), and update any callers/logic that used the old truthiness to ensure SIGINT/SIGTERM races do not prematurely treat a process as dead or skip escalation to killSession.
220-233:⚠️ Potential issue | 🟠 MajorCLI error event still leaks in-flight image temps before draining the queue.
When
event.type === 'error'fires and noresultfollows,session.currentPromptTempsis never cleaned anddrainQueuethen overwrites it with the next prompt's temps — the failed prompt's image files leak. Clean (and null)currentPromptTempsbeforedrainQueue(session)here, mirroring the result branch at lines 243-249.🧹 Proposed fix
session.writer?.send(createNormalizedMessage({ kind: 'error', content: msg, sessionId: sid, provider: 'claude', })); + const current = session.currentPromptTemps; + if (current && Array.isArray(current.tempImagePaths) && current.tempImagePaths.length > 0) { + cleanupTempFiles(current.tempImagePaths, current.tempDir).catch(() => {}); + } + session.currentPromptTemps = null; session.inFlight = false; drainQueue(session); if (!session.inFlight) armIdleTimer(session);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/claude-stream.js` around lines 220 - 233, The error-handling branch for event.type === 'error' fails to cleanup session.currentPromptTemps, leaking temp image files when no result follows; update that branch (the block that logs the error, sets session.inFlight = false, calls drainQueue(session) and armIdleTimer(session)) to clear and null out session.currentPromptTemps before calling drainQueue(session), mirroring the cleanup performed in the successful result branch so temps are removed regardless of success or error.
511-520:⚠️ Potential issue | 🟠 MajorOwnership-rejection path leaks the just-created image temps.
handleImagesat line 459 wrote image files toimageResult.tempDirbefore this branch runs; the earlyreturnat line 519 bypasses every cleanup path (handleEventresult,cleanupSession), so the files sit on disk under the victim session's project.tmp/images/...forever. Clean before returning.🛡️ Proposed fix
if (!sessionBelongsTo(session, ws?.userId ?? null)) { console.warn(`[claude-stream] query rejected: user ${ws?.userId} does not own session ${sessionId}`); + cleanupTempFiles(imageResult.tempImagePaths, imageResult.tempDir).catch(() => {}); ws?.send?.(createNormalizedMessage({ kind: 'error', content: 'Session belongs to another user', sessionId: sessionId || null, provider: 'claude', })); return; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/claude-stream.js` around lines 511 - 520, The ownership-rejection branch returns without cleaning up image temporary files created earlier by handleImages (imageResult.tempDir), so before returning from the sessionBelongsTo check add a cleanup call to remove those temps: ensure you invoke the same cleanup used elsewhere (e.g., call cleanupSession(sessionId) or a small helper that deletes imageResult.tempDir) or run the handleEvent cleanup path so imageResult.tempDir is removed; reference handleImages, imageResult.tempDir, sessionBelongsTo, cleanupSession/handleEvent to locate and hook the cleanup in that branch before the early return.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@server/claude-stream.js`:
- Around line 295-298: Replace the process 'exit' listener with a 'close'
listener to ensure stdio is fully drained before cleanup: change the event
attached on session.process from 'exit' to 'close' in the handler that logs
process termination and calls cleanupSession(session, { sendComplete:
session.inFlight }); keep the same log message (including session.process.pid,
code, signal, session.sessionId) and the same call to cleanupSession so the
final stdout JSONL lines are handled by handleEvent before cleanup runs.
- Around line 545-547: The liveness check in isSessionProcessAlive (used by
getLiveSession, queryClaudeStream, abortClaudeStreamSession, cleanupSession,
etc.) incorrectly relies on process.killed; replace that predicate with a check
that both session.process.exitCode === null and session.process.signalCode ===
null to detect a still-running child (i.e., return !!session &&
session.process.exitCode === null && session.process.signalCode === null), and
update any callers/logic that used the old truthiness to ensure SIGINT/SIGTERM
races do not prematurely treat a process as dead or skip escalation to
killSession.
- Around line 220-233: The error-handling branch for event.type === 'error'
fails to cleanup session.currentPromptTemps, leaking temp image files when no
result follows; update that branch (the block that logs the error, sets
session.inFlight = false, calls drainQueue(session) and armIdleTimer(session))
to clear and null out session.currentPromptTemps before calling
drainQueue(session), mirroring the cleanup performed in the successful result
branch so temps are removed regardless of success or error.
- Around line 511-520: The ownership-rejection branch returns without cleaning
up image temporary files created earlier by handleImages (imageResult.tempDir),
so before returning from the sessionBelongsTo check add a cleanup call to remove
those temps: ensure you invoke the same cleanup used elsewhere (e.g., call
cleanupSession(sessionId) or a small helper that deletes imageResult.tempDir) or
run the handleEvent cleanup path so imageResult.tempDir is removed; reference
handleImages, imageResult.tempDir, sessionBelongsTo, cleanupSession/handleEvent
to locate and hook the cleanup in that branch before the early return.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 50e8344c-d5fb-4dca-84f4-40b524ff40db
📒 Files selected for processing (4)
server/claude-sdk.jsserver/claude-stream.jsserver/index.jssrc/stores/useSessionStore.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- server/claude-sdk.js
- src/stores/useSessionStore.ts
105525a to
5d63d98
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
server/claude-stream.js (1)
482-542:⚠️ Potential issue | 🟠 MajorClean image temps on every early exit after
handleImages().
handleImages()can create files before ownership rejection, MCP loading, or spawn setup. The unauthorized return and catch path currently leave those files behind.🧹 Proposed cleanup guard
async function queryClaudeStream(command, options = {}, ws) { const { sessionId, cwd, model, permissionMode, images, additionalDirs } = options; + let imageResult = null; + let promptSubmitted = false; try { // Save images to disk and augment prompt - const imageResult = await handleImages(command, images, cwd); + imageResult = await handleImages(command, images, cwd); const finalCommand = imageResult.modifiedCommand; @@ if (!sessionBelongsTo(session, ws?.userId ?? null)) { console.warn(`[claude-stream] query rejected: user ${ws?.userId} does not own session ${sessionId}`); + cleanupTempFiles(imageResult.tempImagePaths, imageResult.tempDir).catch(() => {}); ws?.send?.(createNormalizedMessage({ kind: 'error', content: 'Session belongs to another user', @@ submitPrompt(session, { text: finalCommand, tempImagePaths: imageResult.tempImagePaths, tempDir: imageResult.tempDir, }); + promptSubmitted = true; // Intentionally do NOT close stdin — we want to send more prompts later. } catch (err) { + if (!promptSubmitted && imageResult) { + cleanupTempFiles(imageResult.tempImagePaths, imageResult.tempDir).catch(() => {}); + } console.error('[claude-stream] queryClaudeStream error:', err); ws?.send?.(createNormalizedMessage({Also applies to: 554-562
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/claude-stream.js` around lines 482 - 542, After calling handleImages(...), ensure any temp files it produced (e.g., paths on imageResult such as imageResult.tempFiles or imageResult.temps) are always deleted on every early exit or error: delete those temps before returning on the ownership rejection branch (where sessionBelongsTo fails), before returning when spawn/MCP/setup fails, and in the catch/finally around the block (the paths referenced in the comment at 554-562). Update the code paths that return early after handleImages (referencing imageResult, images, handleImages, session, spawnClaudeProcess, ws) to call a single cleanup helper (e.g., cleanupImageTemps(imageResult)) or inline delete logic so no temp files remain on any exit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/stores/useSessionStore.ts`:
- Around line 154-162: takeCanonicalMatch currently compares only .content (and
role/kind) before adding a server message id to consumedServerIds, which
incorrectly matches placeholders whose meaningful data resides in kind-specific
fields; update takeCanonicalMatch to perform kind-aware comparisons: for
example, when m.kind === 'tool_use' compare toolId/toolName/toolInput (or
whichever tool fields exist) against s.toolId/toolName/toolInput, when m.kind
=== 'task_notification' compare summary or the notification-specific fields, and
for other kinds fall back to comparing content; only add s.id to
consumedServerIds when the kind-specific field comparisons match (keep existing
role/kind checks).
---
Duplicate comments:
In `@server/claude-stream.js`:
- Around line 482-542: After calling handleImages(...), ensure any temp files it
produced (e.g., paths on imageResult such as imageResult.tempFiles or
imageResult.temps) are always deleted on every early exit or error: delete those
temps before returning on the ownership rejection branch (where sessionBelongsTo
fails), before returning when spawn/MCP/setup fails, and in the catch/finally
around the block (the paths referenced in the comment at 554-562). Update the
code paths that return early after handleImages (referencing imageResult,
images, handleImages, session, spawnClaudeProcess, ws) to call a single cleanup
helper (e.g., cleanupImageTemps(imageResult)) or inline delete logic so no temp
files remain on any exit.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 19a68f71-4291-452b-9b00-549fa2674a17
📒 Files selected for processing (4)
server/claude-sdk.jsserver/claude-stream.jsserver/index.jssrc/stores/useSessionStore.ts
✅ Files skipped from review due to trivial changes (1)
- server/claude-sdk.js
5d63d98 to
0d0a2c6
Compare
Adds an opt-in alternative path for Claude chat that keeps one `claude` CLI
process alive per session and feeds sequential prompts via
`--input-format stream-json`, instead of spawning a fresh subprocess through
`@anthropic-ai/claude-agent-sdk`'s `query()` for every message.
- First message still pays the full ~22s cold start (plugins/LSPs/MCP init)
- Subsequent messages in the same session skip it — in local benchmarking
per-message latency drops from ~25s to ~12s
- Feature-flagged via `CLAUDE_STREAM_MODE=1`. When unset (default) the
existing SDK path is used unchanged — zero behavioral change for current
users. Idle processes are killed after `CLAUDE_STREAM_IDLE_MS` (default
30 min). Aborts (SIGINT + 2s SIGTERM fallback with cleared timer on
graceful exit) coordinate with the dispatcher so the UI sees a single
`complete` event.
## Temp files and prompt queueing
Temp image files are owned per-prompt: queue entries carry their own
`tempImagePaths` / `tempDir`, and the in-flight prompt's temps live on
`session.currentPromptTemps`. A prompt's `result` event only cleans up its
own files, so a queued prompt's images can't be deleted by the preceding
prompt's cleanup. `cleanupSession` wipes any remaining queue entry temps
on process exit.
A per-session outbound queue prevents two rapid prompts from racing on
stdin (the CLI drops/merges stacked prompts). On sync stdin write failure
`drainQueue` unshifts the entry back onto the queue and surfaces an error
normalized message rather than silently dropping the prompt.
## Session liveness and ownership
`isClaudeStreamSessionActive`, `getActiveClaudeStreamSessions`,
`abortClaudeStreamSession`, and `reconnectStreamSessionWriter` all go
through a `getLiveSession` helper that validates the child process is
still running (`process.killed` / `exitCode`) and prunes stale map
entries. A session that has already exited no longer misroutes aborts
away from SDK fallback or appears as `isProcessing: true`.
Each of those helpers also takes an optional `userId` and rejects
operations when the session was created on a WebSocket with a different
owner, so one authenticated client can't abort or rebind another user's
long-lived Claude process by guessing its sessionId. The ownership check
is a no-op when no userId is supplied (internal callers, single-user
setups).
## CLI event handling
The shared Claude provider adapter normalizes SDK events; CLI stream-json
emits some additional shapes, so the new handler pre-processes them
before forwarding:
- `rate_limit_event` → surfaced as a status message so the user sees
when they approach their quota
- `system/hook_response` with a `systemMessage` payload → forwarded as
status (e.g. `SessionStart` hooks echoing "session saved to task X")
- `type: 'error'` → emitted as error normalized message (the adapter
would otherwise drop it silently, leaving the UI waiting for a
`result` that never comes)
- `result` → synthesized `complete` with `durationMs`, `apiDurationMs`,
`totalCostUsd`, `usage`
- Other `system/*` lifecycle events → dropped as UI noise
`--include-partial-messages` is deliberately NOT set. Emitting
`content_block_delta` events gives token-by-token streaming, but the UI
store finalizes the streamed message with a locally-generated id
(`text_<ts>_<rand>`) that can never match the canonical JSONL id — after
the server catches up on refresh, the streamed copy persists as a
duplicate next to the server's canonical assistant message. Skipping
partials trades streaming for a clean single-message render, symmetric
with the SDK path.
## Dispatcher wiring (server/index.js)
Both Claude code paths now receive options with
`cwd: data.options?.cwd ?? data.options?.projectPath`. Previously
`queryClaudeStream` could start in the server process cwd when clients
sent only `projectPath`, which is especially risky with
skip-permissions. The fallback also hardens the SDK path.
## Realtime/server reconciliation (src/stores/useSessionStore.ts)
The new stream path surfaced a pre-existing race between the UI store's
server fetches and in-flight realtime messages. Both `fetchFromServer`
(called on session load/navigate) and `refreshFromServer` (called when
the file watcher broadcasts `projects_updated`) set `slot.serverMessages`;
only the latter was dropping realtime entries. After a session load the
former left optimistic user messages (`local_*` ids from
`chatMessageToNormalized`) next to their server canonical — visible as a
duplicate user message that sometimes appeared after the assistant reply.
Extracted `reconcileRealtimeAgainstServer()` and call it from both paths.
Drops realtime entries whose id is in `serverMessages`, and drops client
placeholders (`local_*`, `text_*`, `__streaming_*`) using a consume-once
content match: each server message can absorb at most one placeholder, so
consecutive identical prompts ("ok", "ok") aren't collapsed onto the
first canonical. Everything else stays so an in-flight response isn't
wiped when the file watcher fires before the JSONL is fully flushed.
## MVP caveats
- Uses `--dangerously-skip-permissions`. UI approval prompts are not
routed through the new path yet. Hooks are still handled by the CLI
itself via `settings.json`. A follow-up can add an MCP-based permission
prompt tool for parity.
- `canUseTool` / `pendingApprovals` from the SDK path are not supported
in the stream path.
0d0a2c6 to
d15278d
Compare
…ol MCP bridge Replaces --dangerously-skip-permissions (now opt-in via toolsSettings.skipPermissions) with an in-process HTTP MCP server that the Claude CLI calls for every approval decision. The tool bridges into the shared pendingToolApprovals flow from claude-sdk.js, so the existing UI permission_request / claude-permission-response WebSocket round-trip works unchanged across SDK and stream modes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@coderabbitai resume Force-pushed squashed commit after two internal review rounds that fixed mid-flush reordering, writer-clobber on session reuse, and a post-flush readyState gap. Please review the current diff end-to-end. |
|
Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit. |
|
@coderabbitai review |
|
Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit. |
34a1a9c to
baed19a
Compare
a478c4f to
f320627
Compare
… parity with other providers
Fixes the "send button locked after page reload" regression reported for
CLAUDE_STREAM_MODE chats, adds a WebSocket replay buffer so mid-turn
disconnects no longer lose events, lands unit + integration tests
covering the new code paths, and closes several parity gaps against
the SDK / Codex / Cursor / Gemini providers (web-push notifications,
install-check error messaging, cross-platform spawn, token-budget
status, session status machine, hook lifecycle forwarding, and
compact-boundary surfacing).
Symptoms the change addresses:
* After WS reconnect the UI re-locked isLoading because
check-session-status reported isProcessing: true whenever the
long-lived CLI process was alive, regardless of in-flight state.
* Events emitted during the disconnect window were silently dropped
by ws.send on a non-OPEN socket, so a missing `complete` kept the
UI waiting forever even after the prompt finished.
* Each `claude-command` on an existing session replaced
session.writer wholesale, discarding the pending replay buffer on
the old writer — re-introducing the dropped-event bug on every
new prompt after reconnect.
* _flushPending continued past a send failure, so a retried frame
landed behind later frames on the wire (e.g. tool_use →
tool_result could invert).
* send() did not re-check readyState after draining pending, so a
socket that flipped closed mid-flush silently dropped the current
message while its predecessors survived in `pending`.
* Stream-mode runs never fired web-push notifyRunStopped /
notifyRunFailed, so users who navigated away never got pinged.
* Spawn failures (claude CLI missing) produced a cryptic ENOENT
error instead of the friendly install pointer the SDK path emits.
* Token-budget status was never emitted in stream mode, so the UI's
context-window bar stayed blank; hook lifecycle and compaction
events were silently dropped, hiding CLI activity from the user.
Implementation:
* server/claude-stream.js
- new isClaudeStreamSessionProcessing (session.inFlight), wired
into check-session-status
- reuse branch now calls session.writer.updateWebSocket(ws.ws)
instead of replacing the writer
- __test__ export surfaces internals (activeStreamSessions,
getLiveSession, sessionBelongsTo, isSessionProcessAlive,
accumulateTokenBudget, handleEvent, submitPrompt, drainQueue,
writePromptNow) so unit tests can exercise state machines
without spawning the CLI
- notifyRunStopped on every result event, notifyRunFailed on CLI
error events / process 'error' / query-time spawn failure
- providerAuthService.isProviderInstalled('claude') gated behind
the failure path so it adds no happy-path latency
- sessionSummary carried on the session record so long-lived
prompts keep the latest chat name for notifications
- cross-spawn used on win32 (matches cursor-cli / gemini-cli)
- accumulateTokenBudget folds per-prompt usage into cumulative
session totals and emits a `token_budget` status before the
`complete` event
- explicit state machine `session.status: 'running' | 'idle' |
'aborted' | 'completed'` with guarded transitions that preserve
an earlier 'aborted' state across concurrent events;
getClaudeStreamSessionStatus exported for observability
- compact_boundary events surfaced as `status` with
compactMetadata so the UI can show when context was trimmed
- system/hook_started and system/hook_completed forwarded as
`status` events (text = subtype, source = hook_name) so the UI
can render hook progress
* server/index.js — dispatcher reports isProcessing via the new
helper; WebSocketWriter moved out to a dedicated module.
* server/ws-writer.js (new) — WebSocketWriter with a bounded
replay buffer (cap=500, oldest-drop, one-shot warn);
stop-on-first-failure flush preserves order; post-flush
readyState guard prevents the current message from being dropped
when the socket flipped closed while survivors were re-queued.
Tests (all new):
* scripts/test-ws-replay.mjs — 14 cases / 30+ assertions for
WebSocketWriter; 100% stmt / func / line, 96.55% branch on
ws-writer.js.
* scripts/test-claude-stream.mjs — expanded to cover rate_limit,
hook_started, hook_completed, compact_boundary, error-event
semantics, submitPrompt queueing, drainQueue behaviour,
getClaudeStreamSessionStatus, accumulateTokenBudget, and the
ownership helpers.
* scripts/test-claude-permission-mcp.mjs — integration test that
spawns the real claude CLI and covers allow / deny / abort
permission-bridge round-trips; target is a freshly-minted
tempfile so the CLI's per-cwd "allow once" cache cannot short-
circuit the prompt.
f320627 to
bd3a3fb
Compare
Summary
Adds
CLAUDE_STREAM_MODE=1feature flag that runs a long-livedclaude --print --input-format stream-json --output-format stream-jsonsubprocess per chat session instead of re-invoking the SDKquery()for every user message. First message still pays the full CLI cold start (~22s with plugins/LSPs); subsequent messages in the same session complete in ~12s per-prompt overhead, roughly halving the previously observed ~25s chat ping latency.Changes
server/claude-stream.js— new module. Spawns the CLI once per session, maintains a FIFO prompt queue, parses JSONL events from stdout, translates them through the existing provider adapter, handles abort (SIGINT → SIGTERM fallback), idle timeout (kills abandoned sessions after 30 min), and session rekey on CLI-assigned IDs.server/claude-permission-mcp.js— in-process HTTP MCP server. Each stream session registers a unique/claude-permission-mcp/<token>URL; CLI is spawned with--permission-prompt-tool mcp__cloudcli_approval__permission_promptand a--mcp-configentry pointing at it. Bridges into the sharedpendingToolApprovals/resolveToolApprovalflow, so the existing UI (permission_requestWS message +claude-permission-response) works unchanged across SDK and stream modes. Honors--dangerously-skip-permissionswhentoolsSettings.skipPermissionsis set; plan mode bypasses the bridge.server/ws-writer.js— new module.WebSocketWriterextracted fromserver/index.jsand given a bounded replay buffer (cap=500, oldest-drop with one-shot warn). Messages sent during a WS disconnect window (mid-turn reload, network drop) queue up and drain onupdateWebSocket()swap after reconnect. Stop-on-first-failure flush preserves wire order; post-flushreadyStatere-check prevents the current message from being dropped when the socket flipped closed while survivors were re-queued.server/claude-sdk.js— exportshandleImages,cleanupTempFiles,loadMcpConfig, andpendingToolApprovalsfor reuse. No logic changes.server/index.js— feature-flag dispatcher at 4 sites (claude-command,abort-session,check-session-status,get-active-sessions). Thecheck-session-statuspath reportsisProcessingfrom a newisClaudeStreamSessionProcessinghelper (=session.inFlight), not liveness — the previous liveness-based check locked the UI's send button after every reconnect because the long-lived CLI process stays alive between prompts. Reuse of an existing session now swaps the underlying raw socket viasession.writer.updateWebSocket(ws.ws)instead of replacing the writer wholesale, so the replay buffer survives cross-prompt reconnects.src/stores/useSessionStore.ts— client-side realtime-vs-server message reconciliation refined.Tests
All new:
scripts/test-ws-replay.mjs— pure Node (no tsx). 14 cases / 30+ assertions onWebSocketWriter. Coverage: 100% stmt / func / line, 96.55% branch.scripts/test-claude-stream.mjs— tsx. 7 cases / 22 assertions on public helpers via a__test__export.scripts/test-claude-permission-mcp.mjs— integration. Spawns the realclaudeCLI and covers allow / deny / abort. Target is a freshly-minted tempfile so the CLI's per-cwd "allow once" cache cannot short-circuit the prompt.Test plan
CLAUDE_STREAM_MODE=1and start a new chat; first message pays cold start, subsequent messages return in ~12s.completeevents lost.pendingToolApprovalsmap is empty afterwards.Status
Draft — requesting a fresh CodeRabbit review after the squash and regression fixes. Two prior internal review rounds flagged issues that are all now addressed:
🤖 Generated with Claude Code