close

Project

General

Profile

Actions

Bug #22013

closed
Image

Array#| deduplication via eql? breaks when total element count exceeds ~16

Bug #22013: Array#| deduplication via eql? breaks when total element count exceeds ~16

Added by andreyruby (Andrey Glushkov) 6 days ago. Updated 4 days ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.4 (2025-05-14 revision a38531fd3f) +PRISM [arm64-darwin24]
[ruby-core:125342]
Tags:

Description

Problem description

The documentation for Array#| states:

Returns the union of self and other_array; duplicates are removed; order is preserved; items are compared using eql?

However, when the total number of elements across both arrays exceeds ~16, deduplication via eql? stops working and duplicates are included in the result.

Reproducible script

class Item
  attr_reader :id, :pos

  def initialize(id, pos:)
    @id = id
    @pos = pos
  end

  def inspect
    "item-#{id}-#{pos}"
  end

  def eql?(other)
    id == other.id
  end
end

items1 = [Item.new(1, pos: 1)]
items2 = [Item.new(1, pos: 2)]  # same id — should be treated as duplicate by eql?

puts (items1      | items2     ).inspect  # 2 total elements
puts (items1 * 8  | items2 * 8 ).inspect  # 16 total elements
puts (items1 * 15 | items2 * 1 ).inspect  # 16 total elements
puts (items1 * 9  | items2 * 8 ).inspect  # 17 total elements

Actual result

[item-1-1]
[item-1-1]
[item-1-1]
[item-1-1, item-1-2]

Expected result

All four lines should return [item-1-1], since eql? returns true for both items and the documentation makes no mention of any size-dependent behavior.

The last line incorrectly returns two elements. The only change is the total element count crossing ~16 — the objects and eql? implementation are identical.

Image Updated by byroot (Jean Boussier) 5 days ago Actions #1 [ruby-core:125344]

You are missing the corresponding hash method:

  def hash
    [self.class, id].hash
  end

It is somewhat implied by the mention of #eql? (hash based equality) but the documentation could be more explicit of course.

There is also perhaps the question of whether the fast path that doesn't use a Hash for smaller arrays should call #hash too as to have a more consistent behavior.

Image Updated by nobu (Nobuyoshi Nakada) 5 days ago Actions #2 [ruby-core:125347]

  • Tags set to doc

Is this a document issue?

Image Updated by andreyruby (Andrey Glushkov) 5 days ago Actions #3 [ruby-core:125349]

byroot (Jean Boussier) wrote in #note-1:

You are missing the corresponding hash method

It would help to add a note similar to Object#eql? docs (https://docs.ruby-lang.org/en/master/Object.html#method-i-eql-3F):

For any pair of objects where eql? returns true, the hash value of both objects must be equal. So any subclass that overrides eql? should also override hash appropriately.

Someone implementing eql? for a domain object for Array#| (and maybe Array#uniq) will get to those method docs and may never think to check Object#eql?.

Image Updated by byroot (Jean Boussier) 4 days ago Actions #4 [ruby-core:125356]

It would help to add a note similar to Object#eql? docs

Yes that is what I said.

https://github.com/ruby/ruby/pull/16786

Image Updated by byroot (Jean Boussier) 4 days ago 1Actions #5

  • Status changed from Open to Closed

Applied in changeset git|97c070c244ebd76cbf4d9a5b81dca08f4bda6f05.


[DOC] Clarify array methods using eql?

[Bug #22013]

Up to a certain size, only eql? is used, but for larger arrays
a Hash is used an #hash becomes necessary.

We could consider always checking #hash for consistency, but
that would decrease performance.

At the very least we can clarify the documentation.

Actions

Also available in: PDF Atom