close
Skip to content

Fix-propagation gaps: compressionreader missing 3 mirrors of fixes previously applied to decompressionreader #297

@devdanzin

Description

@devdanzin

Summary

Three bugs in compressionreader.c / compressor.c mirror fixes that were previously applied to the decompression side but never propagated to the compression side. Each is a memory leak or a mis-routed error path on the compression reader.

Impact

  • Severity: Memory leaks on error paths (Gaps 1, 2); missing iterator cleanup on an error branch (Gap 3). Not crashes.
  • Reachability: Error paths — read() raising, GC during streaming, compression-iterator constructor failure.
  • Version: 0.25.0 (commit 7a77a75).

Gap 1: Missing Py_XDECREF(result) in compressionreader_read — ~1.1 KB per read error

The sibling function decompressionreader_read correctly does Py_XDECREF(result) at the same error point (compare c-ext/decompressionreader.c:290-291). The compression reader has the same result variable but is missing the cleanup — when read_compressor_input fails inside the read loop, the allocated result PyBytes is not freed.

Site: c-ext/compressionreader.c:294-295.

Reproducer:

import zstandard, tracemalloc, gc

class FailingReader:
    def __init__(self, data):
        self.data, self.pos, self.calls = data, 0, 0
    def read(self, size):
        self.calls += 1
        if self.calls > 1:
            raise IOError("read failed")
        chunk = self.data[self.pos:self.pos + size]
        self.pos += len(chunk)
        return chunk

tracemalloc.start(); gc.collect()
s1 = tracemalloc.take_snapshot()
for _ in range(2000):
    comp = zstandard.ZstdCompressor()
    r = comp.stream_reader(FailingReader(b'x' * 100_000))
    r.__enter__()
    try:
        r.read(1024); r.read(1024)
    except IOError:
        pass
    try:
        r.__exit__(None, None, None)
    except:
        pass

gc.collect()
s2 = tracemalloc.take_snapshot()
diff = sum(s.size_diff for s in s2.compare_to(s1, 'lineno') if s.size_diff > 0)
print(f"{diff/2000:.1f} bytes per error")   # ~1121 (vs ~64 for decompressionreader equivalent)

Fix: Add Py_XDECREF(result); on the error branch, mirroring decompressionreader_read.

Gap 2: Missing Py_CLEAR(readResult) in compressionreader_dealloc

Historical analog: an identical bug was fixed in decompressionreader_dealloc in commit 35af547. The compression reader has the same readResult field, but the fix was never propagated. Leak manifests when the reader is destroyed while holding unprocessed data (exception during compression, GC during active streaming).

Site: c-ext/compressionreader.c:13-22.

Fix (mirror the sibling):

static void compressionreader_dealloc(ZstdCompressionReader *self) {
    PyTypeObject *tp = Py_TYPE(self);
    Py_CLEAR(self->compressor);
    Py_CLEAR(self->source);
    Py_CLEAR(self->readResult);    /* NEW — missing */
    tp->tp_free(self);
    Py_DECREF(tp);                 /* to land together with the heap-type dealloc fix */
}

Gap 3: return NULL bypasses cleanup label in read_to_iter

At c-ext/compressor.c:665, a return NULL; short-circuits past the except: label that would clean up the partially-constructed ZstdCompressorIterator and its held buffer/reader refs. Every call site that hits this branch leaks an iterator object.

Site: c-ext/compressor.c:665.

Fix: Change return NULL; to goto except;.

Suggested PR shape

Three small patches; could be a single PR (they're all in the compression-reader/compressor family) or three tiny ones. The Gap 1 fix is the only one with a measurable runtime reproducer; Gaps 2 and 3 are static-confirmed from sibling diff and error-path review.

Methodology

Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). The toolkit's git-history analyzer flagged commit 35af547 (the decompressor fix) and searched the codebase for structurally similar unpatched code — Gap 2 came out of that search directly. Gap 1 was verified live on CPython 3.14.3 debug: 1,121 B per error vs. ~64 B on the sibling path. Gap 3 was confirmed via static review of the error-path structure. Happy to open a PR.

Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.

Full report

Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions