Add TopN plan node for O(limit) ORDER BY + LIMIT by philcunliffe · Pull Request #24 · hyparam/squirreling

philcunliffe · 2026-04-12T20:51:37Z

Summary

Fuses Sort + Limit into a TopN node using a bounded binary max-heap
ORDER BY x LIMIT N now buffers only N rows instead of the entire dataset
Planner detects Limit(Sort(...)) and Limit(Project(Sort(...))) patterns
Depends on Eagerly materialize row cells during sort buffering #23 (eager sort materialization) for materializeRow

Test plan

All 1322 existing tests pass (plan expectations updated for new TopN shape)
TopN results identical to Sort+Limit for all sort directions and types

Add multi-level caching and reduce per-row overhead: - parseSql: LRU cache (64 entries) avoids re-tokenizing/parsing same SQL strings - planSql: WeakMap cache on parsed ASTs avoids re-planning identical queries - asyncRow: attach _data field for zero-copy collection - collect: sync fast-path skips Promise.all when all rows have pre-materialized _data - executeProject: pre-compute static column names, fast-path for simple identifier projections with direct cell passthrough and _data propagation - executeSql: skip table normalization when no array tables are present - compareForTerm: use module-level Set instead of per-call array allocation - memorySource: hoist column computation outside scan loop, use Set for validation

- Add _data to AsyncRow type definition - Cast to DerivedColumn/IdentifierNode where type narrowing is needed - Type _data as Record<string, SqlPrimitive> - Fix JSDoc placement for compareForTerm

Adapt optimizations to the new QueryResults return type: - executeSql: keep table normalization skip, use new inline plan+execute - executeProject: move pre-computation outside rows(), keep identifier fast-path and static column names inside the rows() generator - Add _data to AsyncRow type definition - Fix JSDoc placement and type casts for tsc

Drop the parseSql/planSql memoization caches added in 881a031. Also rename the pre-materialized row payload from `_data` to `resolved` for clarity, and delete stale scratch files (query-parquet.mjs, repro-525.mjs). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolves all cell values when rows are buffered for ORDER BY, replacing AsyncRow closures (which capture decompressed parquet row group data) with plain value-returning functions. The original closures become GC-eligible immediately. For tables with large text columns (~10KB/row), this reduces per-row buffer cost from ~10KB (closure over parquet data) to ~100B (plain value).

Fuses Sort + Limit into a TopN node that uses a bounded binary max-heap. ORDER BY x LIMIT N now buffers only N rows instead of the entire dataset. The planner detects two patterns: - Limit(Sort(child)) → TopN(child) - Limit(Project(Sort(child))) → Project(TopN(child))

# Conflicts: # src/execute/execute.js # src/execute/sort.js

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

philcunliffe and others added 6 commits April 9, 2026 16:45

Fix typecheck errors

ac13746

- Add _data to AsyncRow type definition - Cast to DerivedColumn/IdentifierNode where type narrowing is needed - Type _data as Record<string, SqlPrimitive> - Fix JSDoc placement for compareForTerm

philcunliffe mentioned this pull request Apr 12, 2026

Add TopN heap and eager sort materialization for memory reduction #22

Closed

3 tasks

philcunliffe and others added 2 commits April 13, 2026 12:12

Merge remote-tracking branch 'origin/master' into perf/topn-heap

cc954d0

# Conflicts: # src/execute/execute.js # src/execute/sort.js

Add missing JSDoc @param types for siftDown/siftUp

de91774

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

philcunliffe marked this pull request as ready for review April 13, 2026 19:39

platypii added 3 commits April 13, 2026 13:37

Revert unnecessary table normalization optimization

2660f38

Fix incorrect numRows on LIMIT when source numRows is unknown

d9712e8

Restore late materialization on sorting and topN

8fcefd6

platypii mentioned this pull request Apr 14, 2026

Add TopN plan node for O(limit) ORDER BY + LIMIT (Async Version) #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TopN plan node for O(limit) ORDER BY + LIMIT#24

Add TopN plan node for O(limit) ORDER BY + LIMIT#24
philcunliffe wants to merge 11 commits intomasterfrom
perf/topn-heap

philcunliffe commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

philcunliffe commented Apr 12, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants