Summary
Three bugs in compressionreader.c / compressor.c mirror fixes that were previously applied to the decompression side but never propagated to the compression side. Each is a memory leak or a mis-routed error path on the compression reader.
Impact
- Severity: Memory leaks on error paths (Gaps 1, 2); missing iterator cleanup on an error branch (Gap 3). Not crashes.
- Reachability: Error paths —
read() raising, GC during streaming, compression-iterator constructor failure.
- Version: 0.25.0 (commit
7a77a75).
Gap 1: Missing Py_XDECREF(result) in compressionreader_read — ~1.1 KB per read error
The sibling function decompressionreader_read correctly does Py_XDECREF(result) at the same error point (compare c-ext/decompressionreader.c:290-291). The compression reader has the same result variable but is missing the cleanup — when read_compressor_input fails inside the read loop, the allocated result PyBytes is not freed.
Site: c-ext/compressionreader.c:294-295.
Reproducer:
import zstandard, tracemalloc, gc
class FailingReader:
def __init__(self, data):
self.data, self.pos, self.calls = data, 0, 0
def read(self, size):
self.calls += 1
if self.calls > 1:
raise IOError("read failed")
chunk = self.data[self.pos:self.pos + size]
self.pos += len(chunk)
return chunk
tracemalloc.start(); gc.collect()
s1 = tracemalloc.take_snapshot()
for _ in range(2000):
comp = zstandard.ZstdCompressor()
r = comp.stream_reader(FailingReader(b'x' * 100_000))
r.__enter__()
try:
r.read(1024); r.read(1024)
except IOError:
pass
try:
r.__exit__(None, None, None)
except:
pass
gc.collect()
s2 = tracemalloc.take_snapshot()
diff = sum(s.size_diff for s in s2.compare_to(s1, 'lineno') if s.size_diff > 0)
print(f"{diff/2000:.1f} bytes per error") # ~1121 (vs ~64 for decompressionreader equivalent)
Fix: Add Py_XDECREF(result); on the error branch, mirroring decompressionreader_read.
Gap 2: Missing Py_CLEAR(readResult) in compressionreader_dealloc
Historical analog: an identical bug was fixed in decompressionreader_dealloc in commit 35af547. The compression reader has the same readResult field, but the fix was never propagated. Leak manifests when the reader is destroyed while holding unprocessed data (exception during compression, GC during active streaming).
Site: c-ext/compressionreader.c:13-22.
Fix (mirror the sibling):
static void compressionreader_dealloc(ZstdCompressionReader *self) {
PyTypeObject *tp = Py_TYPE(self);
Py_CLEAR(self->compressor);
Py_CLEAR(self->source);
Py_CLEAR(self->readResult); /* NEW — missing */
tp->tp_free(self);
Py_DECREF(tp); /* to land together with the heap-type dealloc fix */
}
Gap 3: return NULL bypasses cleanup label in read_to_iter
At c-ext/compressor.c:665, a return NULL; short-circuits past the except: label that would clean up the partially-constructed ZstdCompressorIterator and its held buffer/reader refs. Every call site that hits this branch leaks an iterator object.
Site: c-ext/compressor.c:665.
Fix: Change return NULL; to goto except;.
Suggested PR shape
Three small patches; could be a single PR (they're all in the compression-reader/compressor family) or three tiny ones. The Gap 1 fix is the only one with a measurable runtime reproducer; Gaps 2 and 3 are static-confirmed from sibling diff and error-path review.
Methodology
Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). The toolkit's git-history analyzer flagged commit 35af547 (the decompressor fix) and searched the codebase for structurally similar unpatched code — Gap 2 came out of that search directly. Gap 1 was verified live on CPython 3.14.3 debug: 1,121 B per error vs. ~64 B on the sibling path. Gap 3 was confirmed via static review of the error-path structure. Happy to open a PR.
Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605
Summary
Three bugs in
compressionreader.c/compressor.cmirror fixes that were previously applied to the decompression side but never propagated to the compression side. Each is a memory leak or a mis-routed error path on the compression reader.Impact
read()raising, GC during streaming, compression-iterator constructor failure.7a77a75).Gap 1: Missing
Py_XDECREF(result)incompressionreader_read— ~1.1 KB per read errorThe sibling function
decompressionreader_readcorrectly doesPy_XDECREF(result)at the same error point (comparec-ext/decompressionreader.c:290-291). The compression reader has the sameresultvariable but is missing the cleanup — whenread_compressor_inputfails inside the read loop, the allocatedresultPyBytesis not freed.Site:
c-ext/compressionreader.c:294-295.Reproducer:
Fix: Add
Py_XDECREF(result);on the error branch, mirroringdecompressionreader_read.Gap 2: Missing
Py_CLEAR(readResult)incompressionreader_deallocHistorical analog: an identical bug was fixed in
decompressionreader_deallocin commit35af547. The compression reader has the samereadResultfield, but the fix was never propagated. Leak manifests when the reader is destroyed while holding unprocessed data (exception during compression, GC during active streaming).Site:
c-ext/compressionreader.c:13-22.Fix (mirror the sibling):
Gap 3:
return NULLbypasses cleanup label inread_to_iterAt
c-ext/compressor.c:665, areturn NULL;short-circuits past theexcept:label that would clean up the partially-constructedZstdCompressorIteratorand its held buffer/reader refs. Every call site that hits this branch leaks an iterator object.Site:
c-ext/compressor.c:665.Fix: Change
return NULL;togoto except;.Suggested PR shape
Three small patches; could be a single PR (they're all in the compression-reader/compressor family) or three tiny ones. The Gap 1 fix is the only one with a measurable runtime reproducer; Gaps 2 and 3 are static-confirmed from sibling diff and error-path review.
Methodology
Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). The toolkit's git-history analyzer flagged commit
35af547(the decompressor fix) and searched the codebase for structurally similar unpatched code — Gap 2 came out of that search directly. Gap 1 was verified live on CPython 3.14.3 debug: 1,121 B per error vs. ~64 B on the sibling path. Gap 3 was confirmed via static review of the error-path structure. Happy to open a PR.Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605