A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms
Beihang University 2409


.png)
Quantization granularity refers to the different weight/activation partitions corresponding to each element of the scaling factor and zeropoint. It determines how finely the scale recovers and the zero point shifts. Figure 2 showcases five fundamental types of quantization granularity: tensor-wise, token-wise, channelwise, group-wise, and element-wise.
量化粒度指的是不同的权重/激活分区对应于缩放因子和零点的每个元素。它决定了缩放恢复的精细程度和零点偏移的程度。图2展示了五种基本的量化粒度类型:tensor-wise, token-wise, channelwise, group-wise, and element-wise。