feat(csharp/src/Drivers/Databricks): Add option to enable using direct results for statements#2737
Conversation
… results for queries
c076812 to
56cd239
Compare
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Thanks! We need to mollify the nullability checker, and I've added a few suggestions for fairly superficial changes.
| { | ||
| // The initial response has result data so we don't need to poll | ||
| metadata = DirectResults.ResultSetMetadata; | ||
| } else { |
There was a problem hiding this comment.
| } else { | |
| } | |
| else | |
| { |
|
|
||
| if (!string.IsNullOrEmpty(DirectResults.OperationStatus?.DisplayMessage)) | ||
| { | ||
| throw new HiveServer2Exception(DirectResults.OperationStatus.DisplayMessage) |
There was a problem hiding this comment.
| throw new HiveServer2Exception(DirectResults.OperationStatus.DisplayMessage) | |
| throw new HiveServer2Exception(DirectResults.OperationStatus!.DisplayMessage) |
I think the null-forgiveness operator is required here because the nullability analysis doesn't know about string.IsNullOrEmpty.
| statement.QueryTimeout = QueryTimeoutSeconds; | ||
| } | ||
|
|
||
| public TSparkDirectResults? DirectResults { get; set; } |
There was a problem hiding this comment.
| public TSparkDirectResults? DirectResults { get; set; } | |
| protected TSparkDirectResults? DirectResults { get; set; } |
| private bool _applySSPWithQueries = false; | ||
| private bool _enableDirectResults = true; | ||
|
|
||
| internal static TSparkGetDirectResults defaultGetDirectResults = new(){ |
There was a problem hiding this comment.
| internal static TSparkGetDirectResults defaultGetDirectResults = new(){ | |
| internal static TSparkGetDirectResults defaultGetDirectResults = new() | |
| { |
| /// Checks if direct results are available. | ||
| /// </summary> | ||
| /// <returns>True if direct results are available and contain result data, false otherwise.</returns> | ||
| public bool HasDirectResults() |
There was a problem hiding this comment.
| public bool HasDirectResults() | |
| public bool HasDirectResults => DirectResults?.ResultSet != null && DirectResults?.ResultSetMetadata != null; |
(This is a good candidate to be a property instead of a method.)
| this.statement = statement; | ||
| this.schema = schema; | ||
| this.isLz4Compressed = isLz4Compressed; | ||
|
|
There was a problem hiding this comment.
same here, maybe start a new direct result reader. make sure doing one thing in one class
There was a problem hiding this comment.
maybe not, after a 2nd look at the code
|
@CurtHagenlocher |
Now that #2678 has been checked in there's a merge conflict that will need to be resolved. |
| if (_statement.HasDirectResults && _statement.DirectResults?.ResultSet?.Results?.ResultLinks?.Count > 0) | ||
| { | ||
| // Yield execution so the download queue doesn't get blocked before downloader is started | ||
| await Task.Yield(); |
There was a problem hiding this comment.
Without this line, there's a deadlock because the _downloadQueue is full, and there's no queue consumer since the downloader hasn't started yet
| return baseHandler; | ||
| } | ||
|
|
||
| protected internal override bool AreResultsAvailableDirectly() => _enableDirectResults; |
There was a problem hiding this comment.
Another good place for a property:
| protected internal override bool AreResultsAvailableDirectly() => _enableDirectResults; | |
| protected internal override bool AreResultsAvailableDirectly => _enableDirectResults; |
Sorry; I missed this in my previous review.
There was a problem hiding this comment.
This is overriding AreResultsAvailableDirectly in HiveServer2Connection/SparkConnection, where it is a function. Should I update the field in both parent classes to be a property as well?
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Thanks again! Sorry; I missed another method that should be a property in my previous review. Looks like we're good after that change.
| get { return _directResults; } | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
nit: extra blank line
…t results for statements (apache#2737) - Add option to set EnableDirectResults, which sends getDirectResults in the Thrift execute statement request - If getDirectResults is set in the request, then directResults is set on the response containing initial results (equivalent to the server calling GetOperationStatus, GetResultSetMetadata, FetchResults, and CloseOperation) - If directResults is set on the response, don't poll for the operation status - We already set getDirectResults on requests for metadata commands, just not the execute statement request Tested E2E using `dotnet test --filter CloudFetchE2ETest` ``` [xUnit.net 00:00:00.11] Starting: Apache.Arrow.Adbc.Tests.Drivers.Databricks [xUnit.net 00:01:27.27] Finished: Apache.Arrow.Adbc.Tests.Drivers.Databricks Apache.Arrow.Adbc.Tests.Drivers.Databricks test net8.0 succeeded (87.7s) Test summary: total: 8, failed: 0, succeeded: 8, skipped: 0, duration: 87.7s Build succeeded in 89.1s ```
Tested E2E using
dotnet test --filter CloudFetchE2ETest