Feature #22011: Hash tables with swiss table - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #22011

open

Hash tables with swiss table

Feature #22011: Hash tables with swiss table

Added by dsh0416 (Delton Ding) 11 days ago. Updated 9 days ago.

Status:

Open

Assignee:

Target version:

[ruby-core:125336]

Description

This change adds a Swiss-table-inspired probing layer to Ruby's core st_table, and shrinks st_table_entry from 24 B to 16 B by moving the stored hash into a parallel array. It is built and enabled by default; --disable-swiss-st reverts to the original st.c. The public ABI of struct st_table and the iteration-order guarantee are preserved.

Motivation¶

Hashes are everywhere in Ruby — instance-variable tables, ivar shapes, constant tables, JSON/HTTP/AR rows, every params, every to_h. Profiles of Rails-shaped workloads spend a meaningful fraction of CPU inside st_lookup and st_insert. Two pain points stood out in the upstream implementation:

Probe loops are branch-heavy. Every step of the perturb chain loads a bin, fetches the entry it points to, compares the full st_hash_t (8 B) and only then calls eql?. On a miss that is several dependent loads per probe with no way to fast-reject groups of slots in parallel.
st_table_entry is 24 B. The (hash, key, record) triple gets one cache line per ~2.5 entries. Iteration and equality scans burn through L1 quickly, and Ruby programs typically hold a lot of small-to-mid-sized hashes (so per-table overhead matters).

The Swiss-table family of designs (Abseil flat_hash_map, Rust hashbrown) addresses (1) with a 1-byte-per-slot control array that lets a single SIMD/SWAR comparison reject or short-list 8 slots at once. We borrow that idea but keep Ruby's two-array layout (so we don't break ABI or insertion order) and add a third, parallel ctrl[] byte array. We then attack (2) by also extracting the hash field out of st_table_entry into its own parallel uint32_t hashes[] array.

Changes¶

Adds ST_USE_SWISS_BINS, enabled under RUBY_EXPORT, disabled otherwise so parser_st.c keeps the old-compatible layout.
Removes the stored hash field from st_table_entry in the Swiss path and stores hashes in a side array after entries, using 32-bit stored hashes.
Introduces packed Swiss bin groups: 8 control bytes plus 8 bin indexes per group.
Adds Swiss probing via control-byte matching, using 7-bit h2 fingerprints for candidate filtering.
Packs bins after entries when possible, reducing separate allocation pressure.
Updates allocation, copy, free, memsize, rebuild, rehash, insert, delete, shift, lookup, foreach, keys, and values paths to go through hash/bin abstraction macros.
Adjusts rebuild thresholds for Swiss load factor, using roughly 7/8 bin occupancy.
Adds tombstone-triggered rebuild behavior when many deleted slots accumulate.

Analysis¶

Earlier preliminarily analysis shows HUGE improvements on microbenchmarks after adopting ideas from swiss tables, but it is not quite true in the real-world benches.

Swiss-table-style probing can look huge in isolated lookup/insert microbenchmarks because the hot loop is tight, the table stays cache-hot, and control-byte filtering avoids many full entry/key comparisons.

In real Ruby workloads, the hash table operations are mixed with object allocation, method dispatch, GC barriers, string/hash callbacks, comparisons, branchy VM work, etc. That weakens the “everything is in L1” assumption.

If the old table is kept at < 0.5 load factor, collision chains/probe lengths are already short, so a simpler open-addressing scheme can be very competitive because it has smaller code and fewer moving parts.

The larger practical win of the Swiss-ish design is probably memory density: sustaining a much higher load factor, around 7/8, without making probe behavior collapse. With this new method, we could increase the load factor to about 7/8 with competitive speed. This saves us more memory then the introduced ctrl[].

Files

Download all files

output_001.txt (5.27 KB) output_001.txt		dsh0416 (Delton Ding), 04/23/2026 10:39 AM
output_002.txt (5.29 KB) output_002.txt		dsh0416 (Delton Ding), 04/23/2026 10:39 AM
memory_top_rss_changes.png (111 KB) memory_top_rss_changes.png		dsh0416 (Delton Ding), 04/24/2026 03:43 PM
time_by_test_violin_interp.png (613 KB) time_by_test_violin_interp.png		dsh0416 (Delton Ding), 04/24/2026 03:43 PM
time_by_test_violin_yjit.png (662 KB) time_by_test_violin_yjit.png		dsh0416 (Delton Ding), 04/24/2026 03:43 PM
changes.patch (43.2 KB) changes.patch		dsh0416 (Delton Ding), 04/24/2026 04:02 PM
memory_top_rss_changes.png (112 KB) memory_top_rss_changes.png		dsh0416 (Delton Ding), 04/25/2026 12:59 AM
time_by_test_violin_interp.png (604 KB) time_by_test_violin_interp.png		dsh0416 (Delton Ding), 04/25/2026 12:59 AM
time_by_test_violin_yjit.png (652 KB) time_by_test_violin_yjit.png		dsh0416 (Delton Ding), 04/25/2026 12:59 AM
memory_top_rss_changes.png (112 KB) memory_top_rss_changes.png		dsh0416 (Delton Ding), 04/25/2026 01:00 AM
time_by_test_violin_interp.png (604 KB) time_by_test_violin_interp.png		dsh0416 (Delton Ding), 04/25/2026 01:00 AM
time_by_test_violin_yjit.png (652 KB) time_by_test_violin_yjit.png		dsh0416 (Delton Ding), 04/25/2026 01:00 AM
changes.patch (42.3 KB) changes.patch		dsh0416 (Delton Ding), 04/25/2026 01:00 AM
clipboard-202604251344-sv2ro.png (72.1 KB) clipboard-202604251344-sv2ro.png		dsh0416 (Delton Ding), 04/25/2026 04:44 AM

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#1

Description updated (diff)

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#2

Description updated (diff)

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#3

Description updated (diff)

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#4 [ruby-core:125337]

File swiss-hash.patch added
File deleted (~~patch.diff~~)

fix some regression bugs

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link Download all files
#5 [ruby-core:125338]

File output_001.txt output_001.txt added
File output_002.txt output_002.txt added

add ruby bench results

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#6

File swiss-hash.patch added
File deleted (~~swiss-hash.patch~~)

Updated by dsh0416 (Delton Ding) 10 days ago Actions
Copy link
#7 [ruby-core:125340]

I found some crash cases, that I need to fix it first. Also, I am running the full ruby-bench against the master branch on linux and windows to see if I've missed anything.

Updated by byroot (Jean Boussier) 10 days ago 1Actions
Copy link
#8 [ruby-core:125345]

Just a side note, enlarging struct st_table by 16B as some consequence for various objects sizes (e.g. Hash goes from pool 80 to pool 96, but a bunch of other structs embed struct st_table).

I had a vague plan to collocate the bins and entries buffers like I did with set_table (https://github.com/ruby/ruby/commit/85c52079aa35a1d2e063a5b40eebe91701c8cb9e). If we end up with two more buffers, it becomes more important.

Updated by dsh0416 (Delton Ding) 10 days ago Actions
Copy link
#9

File deleted (~~swiss-hash.patch~~)

Updated by dsh0416 (Delton Ding) 10 days ago Actions
Copy link
#10

File changes.patch added

Updated by dsh0416 (Delton Ding) 10 days ago · Edited Actions
Copy link
#11 [ruby-core:125348]

I see that the benchmarks might be affected, since st_numhash/symbol hash path looks like returning constant 128 on bits 25..31, which increases the hash collision, under investigating.

Updated by dsh0416 (Delton Ding) 9 days ago · Edited Actions
Copy link Download all files
#12 [ruby-core:125350]

File memory_top_rss_changes.png memory_top_rss_changes.png added
File time_by_test_violin_interp.png time_by_test_violin_interp.png added
File time_by_test_violin_yjit.png time_by_test_violin_yjit.png added
Description updated (diff)

Here are the ruby benches results.

Area	interp	yjit
Time geomean	`+1.77%` slower	`-0.17%` faster
Time median benchmark delta	`+1.55%` slower	`+0.02%` flat
Benchmarks >1% slower	42	15
Benchmarks >1% faster	9	15
RSS geomean	`-3.66%` lower	`-3.01%` lower
RSS median benchmark delta	`+0.09%` flat	`+0.09%` flat

Overall: swss is a small interpreter-mode slowdown, essentially neutral or faster under YJIT, and generally lower RSS. The RSS median is basically unchanged, so the memory improvement comes from a subset of large reductions rather than broad small wins.

Largest time slowdowns:

yjit/loops-times: +12.52%
interp/structaset: +9.38%
interp/fib: +8.04%
interp/keyword_args: +6.76%
interp/send_rubyfunc_block: +6.15%

Largest time speedups:

yjit/activerecord: -13.38%
yjit/fluentd: -11.73%
yjit/send_cfunc_block: -8.96%
yjit/nqueens: -6.82%
yjit/splay: -4.47%

Memory is mixed. Very few benches show RSS increases, including interp/graphql-native +20.15%, yjit/fluentd +19.32%, and yjit/graphql-native +17.42%. Biggest RSS reductions are yjit/str_concat -42.45%, interp/psych-load -39.31%, and yjit/psych-load -38.07%, and RSS reductions are more often.

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#13

Description updated (diff)

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#14

File deleted (~~changes.patch~~)

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#15

File changes.patch changes.patch added

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#16 [ruby-core:125351]

From the bench results, we could see some very positive signals in some benches, but it also introduces some regressions, I would try if I could further narrowing down the regressions.

Updated by dsh0416 (Delton Ding) 9 days ago · Edited Actions
Copy link Download all files
#17 [ruby-core:125354]

File memory_top_rss_changes.png memory_top_rss_changes.png added
File time_by_test_violin_interp.png time_by_test_violin_interp.png added
File time_by_test_violin_yjit.png time_by_test_violin_yjit.png added

Updated by dsh0416 (Delton Ding) 9 days ago · Edited Actions
Copy link Download all files
#18 [ruby-core:125355]

File memory_top_rss_changes.png memory_top_rss_changes.png added
File time_by_test_violin_interp.png time_by_test_violin_interp.png added
File time_by_test_violin_yjit.png time_by_test_violin_yjit.png added
File changes.patch changes.patch added

The regression got better controlled with the new patch.

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#19 [ruby-core:125357]

File clipboard-202604251344-sv2ro.png clipboard-202604251344-sv2ro.png added

The Student T test shows that the performance is almost identical, but the memory consumption is confidently lowered by about 3%.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #22011

Hash tables with swiss table

Motivation¶

Changes¶

Analysis¶

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#1

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#2

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#3

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#4 [ruby-core:125337]

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link Download all files
#5 [ruby-core:125338]

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#6

Updated by dsh0416 (Delton Ding) 10 days ago Actions
Copy link
#7 [ruby-core:125340]

Updated by byroot (Jean Boussier) 10 days ago 1Actions
Copy link
#8 [ruby-core:125345]

Updated by dsh0416 (Delton Ding) 10 days ago Actions
Copy link
#9

Updated by dsh0416 (Delton Ding) 10 days ago Actions
Copy link
#10

Updated by dsh0416 (Delton Ding) 10 days ago · Edited Actions
Copy link
#11 [ruby-core:125348]

Updated by dsh0416 (Delton Ding) 9 days ago · Edited Actions
Copy link Download all files
#12 [ruby-core:125350]

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#13

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#14

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#15

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#16 [ruby-core:125351]

Updated by dsh0416 (Delton Ding) 9 days ago · Edited Actions
Copy link Download all files
#17 [ruby-core:125354]

Updated by dsh0416 (Delton Ding) 9 days ago · Edited Actions
Copy link Download all files
#18 [ruby-core:125355]

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#19 [ruby-core:125357]

Project

General

Profile

Ruby

Custom queries

Feature #22011

Hash tables with swiss table

Motivation¶

Changes¶

Analysis¶

Updated by dsh0416 (Delton Ding) 11 days ago ActionsCopy link #1

Updated by dsh0416 (Delton Ding) 11 days ago ActionsCopy link #2

Updated by dsh0416 (Delton Ding) 11 days ago ActionsCopy link #3

Updated by dsh0416 (Delton Ding) 11 days ago ActionsCopy link #4 [ruby-core:125337]

Updated by dsh0416 (Delton Ding) 11 days ago ActionsCopy link Download all files #5 [ruby-core:125338]

Updated by dsh0416 (Delton Ding) 11 days ago ActionsCopy link #6

Updated by dsh0416 (Delton Ding) 10 days ago ActionsCopy link #7 [ruby-core:125340]

Updated by byroot (Jean Boussier) 10 days ago 1ActionsCopy link #8 [ruby-core:125345]

Updated by dsh0416 (Delton Ding) 10 days ago ActionsCopy link #9

Updated by dsh0416 (Delton Ding) 10 days ago ActionsCopy link #10

Updated by dsh0416 (Delton Ding) 10 days ago · Edited ActionsCopy link #11 [ruby-core:125348]

Updated by dsh0416 (Delton Ding) 9 days ago · Edited ActionsCopy link Download all files #12 [ruby-core:125350]

Updated by dsh0416 (Delton Ding) 9 days ago ActionsCopy link #13

Updated by dsh0416 (Delton Ding) 9 days ago ActionsCopy link #14

Updated by dsh0416 (Delton Ding) 9 days ago ActionsCopy link #15

Updated by dsh0416 (Delton Ding) 9 days ago ActionsCopy link #16 [ruby-core:125351]

Updated by dsh0416 (Delton Ding) 9 days ago · Edited ActionsCopy link Download all files #17 [ruby-core:125354]

Updated by dsh0416 (Delton Ding) 9 days ago · Edited ActionsCopy link Download all files #18 [ruby-core:125355]

Updated by dsh0416 (Delton Ding) 9 days ago ActionsCopy link #19 [ruby-core:125357]

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#1

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#2

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#3

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#4 [ruby-core:125337]

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link Download all files
#5 [ruby-core:125338]

Updated by dsh0416 (Delton Ding) 11 days ago Actions
Copy link
#6

Updated by dsh0416 (Delton Ding) 10 days ago Actions
Copy link
#7 [ruby-core:125340]

Updated by byroot (Jean Boussier) 10 days ago 1Actions
Copy link
#8 [ruby-core:125345]

Updated by dsh0416 (Delton Ding) 10 days ago Actions
Copy link
#9

Updated by dsh0416 (Delton Ding) 10 days ago Actions
Copy link
#10

Updated by dsh0416 (Delton Ding) 10 days ago · Edited Actions
Copy link
#11 [ruby-core:125348]

Updated by dsh0416 (Delton Ding) 9 days ago · Edited Actions
Copy link Download all files
#12 [ruby-core:125350]

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#13

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#14

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#15

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#16 [ruby-core:125351]

Updated by dsh0416 (Delton Ding) 9 days ago · Edited Actions
Copy link Download all files
#17 [ruby-core:125354]

Updated by dsh0416 (Delton Ding) 9 days ago · Edited Actions
Copy link Download all files
#18 [ruby-core:125355]

Updated by dsh0416 (Delton Ding) 9 days ago Actions
Copy link
#19 [ruby-core:125357]