Good callout on the asymmetric K/V norm issue with Qwen, Priyanka — that alone changes the deployment math for a lot of production stacks.
The strategic question nags at me: if this is genuinely worth billions in inference savings, why did Google publish it? Google is a chip buyer, not a seller. You publish when the PR value exceeds the strategic value.
And the RaBitQ controversy adds context. Jianyang Gao, whose method TurboQuant builds on, is documenting benchmark inconsistencies — single-core CPU for RaBitQ vs A100 GPU for TurboQuant. If the 6x number needs an asterisk, the downstream deployment math changes significantly.
Meanwhile Goldman Sachs is forecasting 49% semiconductor revenue growth and $700B in AI hardware by Q4 2026. Efficiency doesn't shrink demand in AI — it feeds it.
All of these mean that we will have plenty to write about on our substacks for the foreseeable future! :)