fix(onnx): use HEURISTIC cudnn_conv_algo_search for ORT GPU session#17970
Open
scyyh11 wants to merge 3 commits intoPaddlePaddle:mainfrom
Open
fix(onnx): use HEURISTIC cudnn_conv_algo_search for ORT GPU session#17970scyyh11 wants to merge 3 commits intoPaddlePaddle:mainfrom
scyyh11 wants to merge 3 commits intoPaddlePaddle:mainfrom
Conversation
The ONNX Runtime CUDA EP option was set to "DEFAULT", which pins cuDNN to CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM. For many of PP-OCRv5 server rec's conv configs cuDNN rejects this algo and ORT falls back to a non-cuDNN slow path (visible as "OP Conv(...) running in Fallback mode. May be extremely slow." warnings), making steady-state per-shape latency ~50x higher than necessary. "HEURISTIC" picks a supported algo per node via cudnnGetConvolutionForwardAlgorithm_v7, matching the fix landed in PaddleX#5057 for the ultra-infer C++ ORT backend. Benchmark on PP-OCRv5 server rec ONNX (RTX 2080 Ti, onnxruntime-gpu 1.20.0, CUDA 12.6 / cuDNN 9), shapes [B,3,48,W]: mode cold total steady (1,160) (4,320) (8,480) DEFAULT 4556 ms 206.05 ms 209.31 220.40 HEURISTIC 1488 ms 4.27 ms 9.54 24.96 EXHAUSTIVE 1424 ms 4.17 ms 9.73 24.84 HEURISTIC matches EXHAUSTIVE quality without the per-new-shape benchmark cost EXHAUSTIVE incurs on dynamic-shape OCR workloads. Refs: PaddleOCR#17959, PaddleX#5057 Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
|
Thanks for your contribution! |
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tools/infer/utility.pybuilds an ONNX Runtime CUDA EP session withcudnn_conv_algo_search="DEFAULT".DEFAULTpins cuDNN toCUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM. cuDNN rejects this algo for many of PP-OCRv5 server rec's conv configs, so ORT falls back to a non-cuDNN slow path (visible at runtime asOP Conv(...) running in Fallback mode. May be extremely slow.warnings). Steady-state per-shape latency ends up ~50× higher than it should be.This PR switches the option to
HEURISTIC, which asks cuDNN to pick a supported algo per node viacudnnGetConvolutionForwardAlgorithm_v7. Same fix pattern as PaddleX#5057, but for the Python--use_onnx --use_gpupath here (PaddleX#5057 only patches ultra-infer's C++ ORT backend).Closes/refs #17959.
Benchmark
PP-OCRv5 server rec ONNX, RTX 2080 Ti,
onnxruntime-gpu==1.20.0, CUDA 12.6 / cuDNN 9. Cold = 16 unique shapes[B,3,48,W]withB∈{1,2,4,8},W∈{90,160,320,480}(each first-time). Steady = 5 warmup + 100 timed at a fixed shape.HEURISTICmatchesEXHAUSTIVEruntime quality without the per-new-shape kernel-search cost that hurts dynamic-shape OCR (the original symptom in #17959).Test plan
python -c "import ast; ast.parse(open('tools/infer/utility.py').read())"— file still parses