Summary
Running python -m scripts.run_loop (or run_eval directly) on native Windows Python 3.14 hits three compatibility issues, all rooted in Unix-first assumptions in scripts/run_eval.py and adjacent files. The third is the blocker — without it, the optimizer cannot evaluate any query on Windows.
Environment
- OS: Windows 11 Pro (10.0.26200)
- Python: 3.14.3 (native Windows, not WSL)
- Claude CLI: installed via npm at
~/AppData/Roaming/npm/claude.cmd
- skill-creator: latest from
anthropics/skills (commit shipped in current main as of 2026-04-28)
Issue 1 — subprocess.Popen(["claude", ...]) raises [WinError 2]
File: scripts/run_eval.py line 71, scripts/improve_description.py line 26.
Cause: Windows Python's subprocess.Popen does not search PATHEXT for .cmd/.bat/.ps1 extensions when shell=False. The Anthropic CLI on Windows installs as claude.cmd. Bare "claude" doesn't resolve.
Fix (one-liner, platform-conditional):
cmd = [
"claude.cmd" if os.name == "nt" else "claude",
"-p", query,
...
]
Issue 2 — Path.write_text(...) raises UnicodeEncodeError
Files: scripts/run_loop.py (lines 151, 278, 313, 317, 321), scripts/run_eval.py (line 68), scripts/improve_description.py (line 189), scripts/generate_report.py (line 319), and two open(..., "w") calls in scripts/aggregate_benchmark.py.
Cause: Python on Windows defaults Path.write_text() to the locale codec (cp1252 on most installs), which can't encode characters like ✗ that the eval reports use for failed assertions.
Reproduction (excerpt from optimizer log):
File "C:\...\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '✗' in position 10183
Fix: Add encoding="utf-8" to all .write_text() and open(..., "w") calls in scripts that produce HTML/Markdown reports. ~10 sites.
Issue 3 (BLOCKING) — select.select([process.stdout], [], [], 1.0) raises [WinError 10038]
File: scripts/run_eval.py line 108.
Cause: Python's select.select() on Windows only operates on sockets, not on subprocess pipe file descriptors. The streaming-detection logic that polls claude -p's stdout for early-exit on skill triggering is fundamentally Unix-only as written.
Reproduction (every query in the optimizer fails immediately with):
Warning: query failed: [WinError 10038] An operation was attempted on something that is not a socket
This blocks all of run_eval, run_loop, and improve_description (which depends on eval data).
Proposed fix — thread-and-queue stdout draining. Works identically on Windows and Unix; preserves the early-exit semantics. Replace the select.select-based loop with:
import queue
import threading
def _drain_stdout_to_queue(stream, q):
"""Read subprocess stdout in chunks and push to a queue.
Used instead of select.select() because select() on Windows only works on
sockets, not subprocess pipe FDs. A thread-and-queue pattern works on both
Windows and Unix. Pushes None as a sentinel on EOF.
"""
try:
while True:
chunk = stream.read1(8192) if hasattr(stream, "read1") else stream.read(8192)
if not chunk:
break
q.put(chunk)
except Exception:
pass
finally:
q.put(None)
# In run_single_query, after subprocess.Popen(...):
chunk_queue: queue.Queue = queue.Queue()
reader_thread = threading.Thread(
target=_drain_stdout_to_queue,
args=(process.stdout, chunk_queue),
daemon=True,
)
reader_thread.start()
# Main loop:
while time.time() - start_time < timeout:
try:
chunk = chunk_queue.get(timeout=1.0)
except queue.Empty:
if process.poll() is not None:
break
continue
if chunk is None: # EOF sentinel
break
buffer += chunk.decode("utf-8", errors="replace")
# ... existing line-buffered JSON parsing logic unchanged ...
Verified locally: with this port (plus issues 1 and 2 fixed), python -m scripts.run_loop runs successfully on native Windows Python 3.14, completing the full optimization loop with iterating descriptions and the train/test split.
Willing to submit a PR
I have all three fixes applied locally and the optimizer running end-to-end on Windows. Happy to open a PR with the changes if that's useful — the changes are small and platform-conditional where appropriate (issues 1 and 3 don't change Unix behavior).
Summary
Running
python -m scripts.run_loop(orrun_evaldirectly) on native Windows Python 3.14 hits three compatibility issues, all rooted in Unix-first assumptions inscripts/run_eval.pyand adjacent files. The third is the blocker — without it, the optimizer cannot evaluate any query on Windows.Environment
~/AppData/Roaming/npm/claude.cmdanthropics/skills(commit shipped in current main as of 2026-04-28)Issue 1 —
subprocess.Popen(["claude", ...])raises[WinError 2]File:
scripts/run_eval.pyline 71,scripts/improve_description.pyline 26.Cause: Windows Python's
subprocess.Popendoes not searchPATHEXTfor.cmd/.bat/.ps1extensions whenshell=False. The Anthropic CLI on Windows installs asclaude.cmd. Bare"claude"doesn't resolve.Fix (one-liner, platform-conditional):
Issue 2 —
Path.write_text(...)raisesUnicodeEncodeErrorFiles:
scripts/run_loop.py(lines 151, 278, 313, 317, 321),scripts/run_eval.py(line 68),scripts/improve_description.py(line 189),scripts/generate_report.py(line 319), and twoopen(..., "w")calls inscripts/aggregate_benchmark.py.Cause: Python on Windows defaults
Path.write_text()to the locale codec (cp1252 on most installs), which can't encode characters like✗that the eval reports use for failed assertions.Reproduction (excerpt from optimizer log):
Fix: Add
encoding="utf-8"to all.write_text()andopen(..., "w")calls in scripts that produce HTML/Markdown reports. ~10 sites.Issue 3 (BLOCKING) —
select.select([process.stdout], [], [], 1.0)raises[WinError 10038]File:
scripts/run_eval.pyline 108.Cause: Python's
select.select()on Windows only operates on sockets, not on subprocess pipe file descriptors. The streaming-detection logic that pollsclaude -p's stdout for early-exit on skill triggering is fundamentally Unix-only as written.Reproduction (every query in the optimizer fails immediately with):
This blocks all of
run_eval,run_loop, andimprove_description(which depends on eval data).Proposed fix — thread-and-queue stdout draining. Works identically on Windows and Unix; preserves the early-exit semantics. Replace the
select.select-based loop with:Verified locally: with this port (plus issues 1 and 2 fixed),
python -m scripts.run_loopruns successfully on native Windows Python 3.14, completing the full optimization loop with iterating descriptions and the train/test split.Willing to submit a PR
I have all three fixes applied locally and the optimizer running end-to-end on Windows. Happy to open a PR with the changes if that's useful — the changes are small and platform-conditional where appropriate (issues 1 and 3 don't change Unix behavior).