Bug #22013
closedArray#| deduplication via eql? breaks when total element count exceeds ~16
Description
Problem description¶
The documentation for Array#| states:
Returns the union of self and other_array; duplicates are removed; order is preserved; items are compared using
eql?
However, when the total number of elements across both arrays exceeds ~16, deduplication via eql? stops working and duplicates are included in the result.
Reproducible script¶
class Item
attr_reader :id, :pos
def initialize(id, pos:)
@id = id
@pos = pos
end
def inspect
"item-#{id}-#{pos}"
end
def eql?(other)
id == other.id
end
end
items1 = [Item.new(1, pos: 1)]
items2 = [Item.new(1, pos: 2)] # same id — should be treated as duplicate by eql?
puts (items1 | items2 ).inspect # 2 total elements
puts (items1 * 8 | items2 * 8 ).inspect # 16 total elements
puts (items1 * 15 | items2 * 1 ).inspect # 16 total elements
puts (items1 * 9 | items2 * 8 ).inspect # 17 total elements
Actual result¶
[item-1-1]
[item-1-1]
[item-1-1]
[item-1-1, item-1-2]
Expected result¶
All four lines should return [item-1-1], since eql? returns true for both items and the documentation makes no mention of any size-dependent behavior.
The last line incorrectly returns two elements. The only change is the total element count crossing ~16 — the objects and eql? implementation are identical.
Updated by byroot (Jean Boussier) 5 days ago
You are missing the corresponding hash method:
def hash
[self.class, id].hash
end
It is somewhat implied by the mention of #eql? (hash based equality) but the documentation could be more explicit of course.
There is also perhaps the question of whether the fast path that doesn't use a Hash for smaller arrays should call #hash too as to have a more consistent behavior.
Updated by nobu (Nobuyoshi Nakada) 5 days ago
- Tags set to doc
Is this a document issue?
Updated by andreyruby (Andrey Glushkov) 5 days ago
byroot (Jean Boussier) wrote in #note-1:
You are missing the corresponding
hashmethod
It would help to add a note similar to Object#eql? docs (https://docs.ruby-lang.org/en/master/Object.html#method-i-eql-3F):
For any pair of objects where
eql?returns true, thehashvalue of both objects must be equal. So any subclass that overrideseql?should also overridehashappropriately.
Someone implementing eql? for a domain object for Array#| (and maybe Array#uniq) will get to those method docs and may never think to check Object#eql?.
Updated by byroot (Jean Boussier) 4 days ago
It would help to add a note similar to Object#eql? docs
Yes that is what I said.
Updated by byroot (Jean Boussier) 4 days ago
- Status changed from Open to Closed
Applied in changeset git|97c070c244ebd76cbf4d9a5b81dca08f4bda6f05.
[DOC] Clarify array methods using eql?
[Bug #22013]
Up to a certain size, only eql? is used, but for larger arrays
a Hash is used an #hash becomes necessary.
We could consider always checking #hash for consistency, but
that would decrease performance.
At the very least we can clarify the documentation.