Problem
The floating-bar chat attaches a fresh ~500 KB WebP screenshot to every user message. Pi-mono stores the whole conversation as a message list and re-POSTs every prior message (including images) to /v2/chat/completions on every turn. After enough turns, the accumulated images bloat the request body.
Symptom (before raising the axum limit in #6965):
[app] ScreenCaptureManager: Screenshot captured 5120x2880, WebP 502 KB
[agent] Reusing pi-mono session: pi-session-1 (key=floating)
[agent] Pi-mono: including screenshot image in prompt (image/webp)
[pi-mono] turn_end ERROR: 413 Failed to buffer the request body: length limit exceeded
Math per turn (N prior images):
- per-image: 502 KB × 4/3 ≈ 669 KB base64
- body at turn 3: ~2.06 MB → exceeds axum's 2 MB default → 413
Interim mitigation (shipped)
PR #6965 raises the backend request body limit on /v2/chat/completions from 2 MB to 16 MB. That buys roughly 20 accumulated screenshots of headroom — enough for any realistic floating-bar session — but doesn't fix unbounded growth.
Proposed fix
In desktop/agent/src/adapters/pi-mono.ts, before serializing messages to the provider, walk the conversation and replace older image content blocks with a short text placeholder. Keep only the most recent image (that's the current screen state — the only one the model actually needs).
Pseudocode:
function stripOldImages(messages) {
// Find the index of the LAST message that contains an image block
const lastImageIdx = messages.findLastIndex(msg =>
Array.isArray(msg.content) && msg.content.some(b => b.type === "image")
);
return messages.map((msg, i) => {
if (i === lastImageIdx) return msg; // keep the most recent
if (!Array.isArray(msg.content)) return msg;
return {
...msg,
content: msg.content.map(b =>
b.type === "image"
? { type: "text", text: "[earlier screenshot omitted]" }
: b
),
};
});
}
This:
- Drops redundant visual data that the model doesn't need (the model is looking at the current screen)
- Preserves full conversational text context
- Keeps steady-state body size roughly constant regardless of turn count
- Doesn't touch Swift — image quality stays full
Acceptance
Problem
The floating-bar chat attaches a fresh ~500 KB WebP screenshot to every user message. Pi-mono stores the whole conversation as a message list and re-POSTs every prior message (including images) to
/v2/chat/completionson every turn. After enough turns, the accumulated images bloat the request body.Symptom (before raising the axum limit in #6965):
Math per turn (N prior images):
Interim mitigation (shipped)
PR #6965 raises the backend request body limit on
/v2/chat/completionsfrom 2 MB to 16 MB. That buys roughly 20 accumulated screenshots of headroom — enough for any realistic floating-bar session — but doesn't fix unbounded growth.Proposed fix
In
desktop/agent/src/adapters/pi-mono.ts, before serializing messages to the provider, walk the conversation and replace older image content blocks with a short text placeholder. Keep only the most recent image (that's the current screen state — the only one the model actually needs).Pseudocode:
This:
Acceptance