close
Skip to content

Pass dynamic filter predicate to ConnectorSplitSource#getNextBatch#29206

Open
chenjian2664 wants to merge 2 commits intotrinodb:masterfrom
chenjian2664:jack/add-domain-to-page-source
Open

Pass dynamic filter predicate to ConnectorSplitSource#getNextBatch#29206
chenjian2664 wants to merge 2 commits intotrinodb:masterfrom
chenjian2664:jack/add-domain-to-page-source

Conversation

@chenjian2664
Copy link
Copy Markdown
Contributor

@chenjian2664 chenjian2664 commented Apr 22, 2026

Description

Move dynamic filter waiting logic from individual connectors into the engine layer by adding a TupleDomain<ColumnHandle> dynamicFilterPredicate parameter to ConnectorSplitSource#getNextBatch, so connectors receive the resolved predicate per batch without needing to hold a DynamicFilter and manage wait/timeout logic themselves.

Previously, IcebergSplitSource (and DeltaLakeSplitSource) each duplicated the same pattern: hold a DynamicFilter, poll isAwaitable()/isBlocked(), manage a Stopwatch against a connector-specific timeout, and only then call into Iceberg's planning API. This logic now lives entirely in ConnectorAwareSplitSource, which waits for dynamic filter completion (controlled by a new dynamic_filtering_wait_timeout system session property backed by DynamicFilterConfig) and passes dynamicFilter.getCurrentPredicate() to the connector on each batch call.

The change is split into two commits:

  1. Engine/SPI: Add getNextBatch(int, TupleDomain<ColumnHandle>) as the new primary method (default delegates to the deprecated getNextBatch(int)); move wait/timeout ownership into ConnectorAwareSplitSource; update ClassLoaderSafeConnectorSplitSource to forward the predicate.
  2. Iceberg: Override the new method in IcebergSplitSource; remove DynamicFilter/Stopwatch fields; remove iceberg.dynamic-filtering.wait-timeout config property (marked @DefunctConfig) and the dynamic_filtering_wait_timeout connector session property; update tests and docs.

Additional Changes

  • Connectors that still override the deprecated getNextBatch(int) continue to work via the default bridge — no changes required in other connectors.
  • BaseIcebergConnectorTest and TestIcebergDynamicPartitionPruningTest updated to set the engine-level dynamic_filtering_wait_timeout system property instead of the removed Iceberg-specific one.

Additional context and related issues

Release notes

() This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

  ## SPI                                                                                                                                                                                                                               
  * Pass the resolved dynamic filter predicate to connectors per batch. ({issue}`29206`)                                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                       
  ## Iceberg connector                                                                                                                                                                                                                 
  * {{breaking}} Remove `iceberg.dynamic-filtering.wait-timeout` configuration property.                                                                                                                                                            
    Use the engine-level `dynamic-filtering.wait-timeout` instead. ({issue}`29206`)

@cla-bot cla-bot Bot added the cla-signed label Apr 22, 2026
@github-actions github-actions Bot added docs iceberg Iceberg connector labels Apr 22, 2026
@chenjian2664 chenjian2664 marked this pull request as draft April 22, 2026 07:00
@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch from 6805608 to da416b1 Compare April 22, 2026 07:48
@github-actions github-actions Bot added bigquery BigQuery connector mongodb MongoDB connector cassandra Cassandra connector pinot Pinot connector prometheus Prometheus connector labels Apr 22, 2026
@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch from da416b1 to 4c7cf8a Compare April 22, 2026 08:12
@github-actions github-actions Bot added the hudi Hudi connector label Apr 22, 2026
@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch 4 times, most recently from 6f38f96 to 0648ae0 Compare April 23, 2026 03:43
@chenjian2664 chenjian2664 marked this pull request as ready for review April 23, 2026 09:47
@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch from 07fc49b to 0648ae0 Compare April 24, 2026 07:34
@wendigo wendigo requested a review from raunaqmorarka April 24, 2026 15:19
@chenjian2664 chenjian2664 requested a review from ebyhr April 27, 2026 06:44
@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Apr 27, 2026

Could you rebase on master to resolve conflicts?

Add TupleDomain `dynamicFilterPredicate` parameter to
`ConnectorSplitSource#getNextBatch` so connectors receive the resolved
predicate per batch without needing to hold a `DynamicFilter` and manage
the wait logic themselves. `ConnectorAwareSplitSource` now owns the
dynamic filter wait timeout and passes getCurrentPredicate() down on
each batch call.
@chenjian2664 chenjian2664 force-pushed the jack/add-domain-to-page-source branch from 0648ae0 to 8c6f835 Compare April 27, 2026 07:56
Copy link
Copy Markdown
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope @raunaqmorarka chimes in this PR.

return dynamicFilteringWaitTimeout;
}

@Config("dynamic-filtering.wait-timeout")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raunaqmorarka Do you know why the DF timeout and the relevant logic were implemented on the connectors' side? (To avoid SPI change ConnectorSplitSource.getNextBatch, or allow different values per catalog?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery BigQuery connector cassandra Cassandra connector cla-signed docs hudi Hudi connector iceberg Iceberg connector mongodb MongoDB connector pinot Pinot connector prometheus Prometheus connector

Development

Successfully merging this pull request may close these issues.

2 participants