Summary
7 ZSTD API calls that can run for nontrivial time (milliseconds to seconds on large inputs) are executed with the GIL held. In five of these — all on the compression side, at the ZSTD_e_end finalization step — the non-EOF path in the same function correctly wraps ZSTD_compressStream2 in Py_BEGIN/END_ALLOW_THREADS; the EOF finalization step does not. Two more sites (dictionary creation and dict-chain content-size lookup) are additional minor gaps.
Impact is that other Python threads block during these calls, reducing the benefit of running zstandard in a multi-threaded program.
Impact
- Severity: Performance — other threads blocked. No crash, no correctness issue.
- Reachability: Multi-threaded programs using zstandard alongside other work; most visible on large compression finalization calls.
- Version: 0.25.0 (commit
7a77a75).
Sites
EOF finalization — 5 sites
These finalize a compression stream with ZSTD_e_end. For large pending buffers this can take substantial time; the non-EOF loop in the same function already correctly releases the GIL, so the EOF step is an inconsistency.
| File |
Line |
Context |
c-ext/compressoriterator.c |
129 |
ZstdCompressorIterator EOF flush |
c-ext/compressionreader.c |
312 |
read() EOF |
c-ext/compressionreader.c |
444 |
readinto() EOF |
c-ext/compressionreader.c |
548 |
readall() EOF |
c-ext/compressionreader.c |
610 |
read1() EOF |
Dictionary creation — 1 site
ZSTD_createCDict_advanced in c-ext/compressiondict.c. Can be slow for large dictionaries (megabytes-plus).
Dict-chain content-size — 1 site
ZSTD_getFrameContentSize in decompress_content_dict_chain. Fast per-call but inconsistent with the surrounding code that does release the GIL around the main decompression steps.
Fix
Wrap each call:
Py_BEGIN_ALLOW_THREADS
zresult = ZSTD_compressStream2(cctx, &output, &input, ZSTD_e_end);
Py_END_ALLOW_THREADS
For the five EOF sites, the non-EOF path in the same function already uses this wrapping — consistency is the cleanest way to fix the bug and prevents the GIL-unsafe EOF variant from being reintroduced.
Suggested PR shape
One PR covering all 7 sites. No behavioral change beyond "other threads can run during these calls". No API surface change.
Methodology
Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). The GIL-discipline scanner identifies ZSTD calls that (a) take a context/cctx pointer that is known to run for nontrivial time and (b) do not sit between Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS macros. The EOF sites were flagged both by the scanner and by the "same function, two paths, only one releases GIL" consistency check in the informed pass. No live reproducer — this is a latent performance issue, not a correctness bug. Happy to open a PR.
Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605
Summary
7 ZSTD API calls that can run for nontrivial time (milliseconds to seconds on large inputs) are executed with the GIL held. In five of these — all on the compression side, at the
ZSTD_e_endfinalization step — the non-EOF path in the same function correctly wrapsZSTD_compressStream2inPy_BEGIN/END_ALLOW_THREADS; the EOF finalization step does not. Two more sites (dictionary creation and dict-chain content-size lookup) are additional minor gaps.Impact is that other Python threads block during these calls, reducing the benefit of running zstandard in a multi-threaded program.
Impact
7a77a75).Sites
EOF finalization — 5 sites
These finalize a compression stream with
ZSTD_e_end. For large pending buffers this can take substantial time; the non-EOF loop in the same function already correctly releases the GIL, so the EOF step is an inconsistency.c-ext/compressoriterator.cZstdCompressorIteratorEOF flushc-ext/compressionreader.cread()EOFc-ext/compressionreader.creadinto()EOFc-ext/compressionreader.creadall()EOFc-ext/compressionreader.cread1()EOFDictionary creation — 1 site
ZSTD_createCDict_advancedinc-ext/compressiondict.c. Can be slow for large dictionaries (megabytes-plus).Dict-chain content-size — 1 site
ZSTD_getFrameContentSizeindecompress_content_dict_chain. Fast per-call but inconsistent with the surrounding code that does release the GIL around the main decompression steps.Fix
Wrap each call:
For the five EOF sites, the non-EOF path in the same function already uses this wrapping — consistency is the cleanest way to fix the bug and prevents the GIL-unsafe EOF variant from being reintroduced.
Suggested PR shape
One PR covering all 7 sites. No behavioral change beyond "other threads can run during these calls". No API surface change.
Methodology
Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). The GIL-discipline scanner identifies ZSTD calls that (a) take a context/cctx pointer that is known to run for nontrivial time and (b) do not sit between
Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADSmacros. The EOF sites were flagged both by the scanner and by the "same function, two paths, only one releases GIL" consistency check in the informed pass. No live reproducer — this is a latent performance issue, not a correctness bug. Happy to open a PR.Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605