论文

论文地址

MIXED PRECISION TRAINING 百度英伟达

目的[3]

为了加快训练时间、减少网络训练时候所占用的内存，并且保存训练出来的模型精度持平的条件下，业界提出越来越多的混合精度训练的方法

automatic mixed precision [10]

混合使用 fp32(single precision) 和 fp16（half precision）
优势
- large batch size/models，加速训练；
- 模型 performance 并不会有显著降低；

fp32（single） vs. fp16（half）[10]

fp32（single precision） vs. fp16（half precision）
- fp16 的 dynamic range 是足够的，gradient （weight update）的计算需要将其 scale 避免 fp16 的浮点数下溢；
fp16 is fast and memory-efficient；
- 更快的 compute throughout （8x）
- 更高的 memory throughout (2x)
- 更小的显存占用 (1/2x)