feat: add unenforced_clustering_key to format spec#6552
feat: add unenforced_clustering_key to format spec#6552beinan wants to merge 2 commits intolance-format:mainfrom
Conversation
| ### Clustering Key Metadata | ||
|
|
||
| Clustering key configuration is handled by two protobuf fields in the Field message: | ||
| - **unenforced_clustering_key** (bool): Whether this field is part of the clustering key |
There was a problem hiding this comment.
for unenforced primary key, we initially introduced the boolean and later moved to position because position makes the key fields ordered. I think for clustering, we can just go with position directly without the boolean
There was a problem hiding this comment.
Thanks Jack! Good call - updated to drop the boolean and use only unenforced_clustering_key_position as the single source of truth.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
415fb4f to
1118bfc
Compare
There was a problem hiding this comment.
we should probably also add getUnenforcedPrimaryKey() and getUnenforcedClusteringKey() in LanceSchema
There was a problem hiding this comment.
and similar comment for python
1118bfc to
ad6157f
Compare
|
Thanks Jack! Added |
ad6157f to
3304e87
Compare
Add clustering key metadata to the Lance schema, following the same pattern as unenforced_primary_key. Clustering keys hint at the physical ordering of data within a table, enabling query engine optimizations such as storage-partitioned joins (SPJ). Changes across all layers: - Protobuf: two new fields (bool + uint32 position) in Field message - Rust core: field struct, constants, Arrow metadata parsing - Rust schema: ordered field collection method - Protobuf serialization: round-trip support - Java JNI + LanceField: constructor and getters - Python bindings + type stubs: is/position methods - Format docs: clustering key metadata section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3304e87 to
417ddb4
Compare
Summary
unenforced_clustering_keymetadata to the Lance schema format, mirroring the existingunenforced_primary_keypatternChanges across all layers:
unenforced_clustering_key(bool) +unenforced_clustering_key_position(uint32) fields 14-15is_unenforced_clustering_key()/unenforced_clustering_key_position()Motivation
This was discussed in the lance-spark SPJ PR (lance-format/lance-spark#445). Rather than using custom table properties, embedding clustering key info in the schema metadata follows the established pattern and avoids migration issues.
Test plan
cargo check -p lance-core -p lance-filepassescargo test -p lance-core -p lance-filepasses (all tests including existing primary key tests)cargo clippy -p lance-core -p lance-file --tests -- -D warningsclean🤖 Generated with Claude Code