close
Skip to content

fix(onnx): use HEURISTIC cudnn_conv_algo_search for ORT GPU session#17970

Open
scyyh11 wants to merge 3 commits intoPaddlePaddle:mainfrom
scyyh11:fix/onnx-cudnn-conv-algo-heuristic
Open

fix(onnx): use HEURISTIC cudnn_conv_algo_search for ORT GPU session#17970
scyyh11 wants to merge 3 commits intoPaddlePaddle:mainfrom
scyyh11:fix/onnx-cudnn-conv-algo-heuristic

Conversation

@scyyh11
Copy link
Copy Markdown
Collaborator

@scyyh11 scyyh11 commented Apr 24, 2026

Summary

tools/infer/utility.py builds an ONNX Runtime CUDA EP session with cudnn_conv_algo_search="DEFAULT". DEFAULT pins cuDNN to CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM. cuDNN rejects this algo for many of PP-OCRv5 server rec's conv configs, so ORT falls back to a non-cuDNN slow path (visible at runtime as OP Conv(...) running in Fallback mode. May be extremely slow. warnings). Steady-state per-shape latency ends up ~50× higher than it should be.

This PR switches the option to HEURISTIC, which asks cuDNN to pick a supported algo per node via cudnnGetConvolutionForwardAlgorithm_v7. Same fix pattern as PaddleX#5057, but for the Python --use_onnx --use_gpu path here (PaddleX#5057 only patches ultra-infer's C++ ORT backend).

Closes/refs #17959.

Benchmark

PP-OCRv5 server rec ONNX, RTX 2080 Ti, onnxruntime-gpu==1.20.0, CUDA 12.6 / cuDNN 9. Cold = 16 unique shapes [B,3,48,W] with B∈{1,2,4,8}, W∈{90,160,320,480} (each first-time). Steady = 5 warmup + 100 timed at a fixed shape.

mode cold total cold avg steady (1,3,48,160) (4,3,48,320) (8,3,48,480)
DEFAULT 4556 ms 284.7 ms 206.05 ms 209.31 ms 220.40 ms
HEURISTIC 1488 ms 93.0 ms 4.27 ms 9.54 ms 24.96 ms
EXHAUSTIVE 1424 ms 89.0 ms 4.17 ms 9.73 ms 24.84 ms

HEURISTIC matches EXHAUSTIVE runtime quality without the per-new-shape kernel-search cost that hurts dynamic-shape OCR (the original symptom in #17959).

Test plan

  • python -c "import ast; ast.parse(open('tools/infer/utility.py').read())" — file still parses
  • One-line option change; no API or default behavior change for non-ONNX users
  • CI green

The ONNX Runtime CUDA EP option was set to "DEFAULT", which pins
cuDNN to CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM. For many
of PP-OCRv5 server rec's conv configs cuDNN rejects this algo and
ORT falls back to a non-cuDNN slow path (visible as
"OP Conv(...) running in Fallback mode. May be extremely slow."
warnings), making steady-state per-shape latency ~50x higher than
necessary.

"HEURISTIC" picks a supported algo per node via
cudnnGetConvolutionForwardAlgorithm_v7, matching the fix landed in
PaddleX#5057 for the ultra-infer C++ ORT backend.

Benchmark on PP-OCRv5 server rec ONNX (RTX 2080 Ti,
onnxruntime-gpu 1.20.0, CUDA 12.6 / cuDNN 9), shapes [B,3,48,W]:

  mode        cold total   steady (1,160)   (4,320)   (8,480)
  DEFAULT     4556 ms      206.05 ms        209.31    220.40
  HEURISTIC   1488 ms        4.27 ms          9.54     24.96
  EXHAUSTIVE  1424 ms        4.17 ms          9.73     24.84

HEURISTIC matches EXHAUSTIVE quality without the per-new-shape
benchmark cost EXHAUSTIVE incurs on dynamic-shape OCR workloads.

Refs: PaddleOCR#17959, PaddleX#5057
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 24, 2026

Thanks for your contribution!

Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
@scyyh11 scyyh11 requested a review from Bobholamovic April 27, 2026 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant