[Feature Request] Add int4/uint4 support for IBM Power Systems (PowerPC; VSX/MMA)

### Describe the feature request

**Description**:

ONNX Runtime supports int4/uint4 mainly on x86 (AVX2/AVX-512), but lacks optimized support for IBM Power Systems (PowerPC). This limits efficient inference of 4-bit quantized models on this architecture.

**Proposal:**

- Add VSX + MMA optimized kernels (e.g., GEMM/MatMul, dot products)
- Extend MLAS (or equivalent) with PowerPC paths
- Support existing packed int4 formats

**Notes:**

- VSX can handle unpacking; MMA can accelerate matrix multiply/accumulation
- Approach can follow existing x86 implementations (e.g., ggml)

**Questions:**

- Any plans for non-x86 int4 support?
- Preferred integration point (MLAS vs EP)?

Thanks!

### Describe scenario use case

Running LLM inference on IBM Power Systems using 4-bit quantized models (e.g., weight-only int4). Without native int4/uint4 support in ONNX Runtime, deployments must fall back to int8 or higher precision, leading to increased memory bandwidth usage and reduced throughput.

Enabling int4 with VSX/MMA would allow efficient execution of quantized GEMM/dot-product workloads, improving performance and reducing memory footprint for large models on PowerPC-based systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add int4/uint4 support for IBM Power Systems (PowerPC; VSX/MMA) #28199

Describe the feature request

Describe scenario use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Add int4/uint4 support for IBM Power Systems (PowerPC; VSX/MMA) #28199

Description

Describe the feature request

Describe scenario use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions