close
Skip to content

GIL held during long-running ZSTD operations: 7 sites (5 EOF compression finalizations + dictionary creation + dict-chain content-size) #300

@devdanzin

Description

@devdanzin

Summary

7 ZSTD API calls that can run for nontrivial time (milliseconds to seconds on large inputs) are executed with the GIL held. In five of these — all on the compression side, at the ZSTD_e_end finalization step — the non-EOF path in the same function correctly wraps ZSTD_compressStream2 in Py_BEGIN/END_ALLOW_THREADS; the EOF finalization step does not. Two more sites (dictionary creation and dict-chain content-size lookup) are additional minor gaps.

Impact is that other Python threads block during these calls, reducing the benefit of running zstandard in a multi-threaded program.

Impact

  • Severity: Performance — other threads blocked. No crash, no correctness issue.
  • Reachability: Multi-threaded programs using zstandard alongside other work; most visible on large compression finalization calls.
  • Version: 0.25.0 (commit 7a77a75).

Sites

EOF finalization — 5 sites

These finalize a compression stream with ZSTD_e_end. For large pending buffers this can take substantial time; the non-EOF loop in the same function already correctly releases the GIL, so the EOF step is an inconsistency.

File Line Context
c-ext/compressoriterator.c 129 ZstdCompressorIterator EOF flush
c-ext/compressionreader.c 312 read() EOF
c-ext/compressionreader.c 444 readinto() EOF
c-ext/compressionreader.c 548 readall() EOF
c-ext/compressionreader.c 610 read1() EOF

Dictionary creation — 1 site

ZSTD_createCDict_advanced in c-ext/compressiondict.c. Can be slow for large dictionaries (megabytes-plus).

Dict-chain content-size — 1 site

ZSTD_getFrameContentSize in decompress_content_dict_chain. Fast per-call but inconsistent with the surrounding code that does release the GIL around the main decompression steps.

Fix

Wrap each call:

Py_BEGIN_ALLOW_THREADS
zresult = ZSTD_compressStream2(cctx, &output, &input, ZSTD_e_end);
Py_END_ALLOW_THREADS

For the five EOF sites, the non-EOF path in the same function already uses this wrapping — consistency is the cleanest way to fix the bug and prevents the GIL-unsafe EOF variant from being reintroduced.

Suggested PR shape

One PR covering all 7 sites. No behavioral change beyond "other threads can run during these calls". No API surface change.

Methodology

Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). The GIL-discipline scanner identifies ZSTD calls that (a) take a context/cctx pointer that is known to run for nontrivial time and (b) do not sit between Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS macros. The EOF sites were flagged both by the scanner and by the "same function, two paths, only one releases GIL" consistency check in the informed pass. No live reproducer — this is a latent performance issue, not a correctness bug. Happy to open a PR.

Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.

Full report

Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions