fix(i2_s): resolve to_float UB + guard BLAS I2_S routing by JasonOA888 · Pull Request #533 · microsoft/BitNet

JasonOA888 · 2026-04-06T03:09:24Z

Fixes #468, Fixes #512

Bug 1: to_float callback UB (missing scale param)

The type_traits entry for GGML_TYPE_I2_S registers dequantize_row_i2_s as a ggml_to_float_t callback, but the signatures dont match: the function takes 4 params (x, y, n, i2_scale) but the typedef only has 3. The cast silences the compiler but causes UB — the 4th argument register contains garbage, producing wrong dequantized weights on ARM and potentially x86.

Fix: Added dequantize_row_i2_s_wrapper() that matches the ggml_to_float_t signature and calls the real dequantizer.

Bug 2: BLAS routes I2_S through generic MUL_MAT (segfault on Apple Silicon)

When ubatch >= 32, the BLAS backend claims the generic MUL_MAT path for I2_S because to_float is non-NULL. But I2_S stores an external scale outside the per-row payload, which the generic BLAS dequantize path does not handle. This causes segfaults on Apple Silicon Metal+BLAS.

Fix: Explicitly reject GGML_TYPE_I2_S from the BLAS MUL_MAT support check so I2_S uses its specialized kernel path.

Changes

llama.cpp submodule updated with fixes (commit aa603a3):

ggml-quants.c: Added dequantize_row_i2_s_wrapper() with correct 3-param signature
ggml-quants.h: Declared the wrapper function
ggml.c: Use wrapper instead of cast in type_traits
ggml-blas.cpp: Reject GGML_TYPE_I2_S from both MUL_MAT and OUT_PROD support checks

Submodule URL changed to JasonOA888/llama.cpp so the fix commit is publicly reachable (previous PR pointed to a local-only commit).

Testing

Wrapper correctly routes to the real dequantizer
BLAS guard is minimal (single condition addition, no perf impact)
Specialized I2_S kernels (gemv/gemm/vec_dot) unaffected

Fixes microsoft#468, Fixes microsoft#512

JasonOA888 force-pushed the fix/issue-468-i2-s-v2 branch 4 times, most recently from 819bb55 to 226b2ab Compare April 6, 2026 03:36

fix(i2_s): resolve to_float UB + guard BLAS I2_S routing

a6fdd51

Fixes microsoft#468, Fixes microsoft#512

JasonOA888 force-pushed the fix/issue-468-i2-s-v2 branch from 226b2ab to a6fdd51 Compare April 6, 2026 03:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(i2_s): resolve to_float UB + guard BLAS I2_S routing#533

fix(i2_s): resolve to_float UB + guard BLAS I2_S routing#533
JasonOA888 wants to merge 1 commit intomicrosoft:mainfrom
JasonOA888:fix/issue-468-i2-s-v2

JasonOA888 commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JasonOA888 commented Apr 6, 2026

Bug 1: to_float callback UB (missing scale param)

Bug 2: BLAS routes I2_S through generic MUL_MAT (segfault on Apple Silicon)

Changes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant