QAT（Quantization Aware Training）量化感知训练（二）【详解】

全息数据

已于 2023-05-16 11:39:09 修改

阅读量750

点赞数 1

分类专栏：图像算法图像分割文章标签：卷积神经网络深度学习目标检测

于 2023-03-09 11:51:42 首次发布

本文链接：https://blog.csdn.net/qq_23022733/article/details/129420120

版权

图像算法同时被 2 个专栏收录

71 篇文章

订阅专栏

图像分割

21 篇文章

订阅专栏

文章目录

- - 1、QAT（Quantization Aware Training）的建议

1、QAT（Quantization Aware Training）的建议

Quantization Aware Training is based on Straight Through Estimator (STE) derivative approximation. It is some time known as “quantization aware training”. We don’t use the name because it doesn’t reflect the underneath assumption. If anything, it makes training being “unaware” of quantization because of the STE approximation.

After calibration is done, Quantization Aware Training is simply select a training schedule and continue training the calibrated model. Usually, it doesn’t need to fine tune very long. We usually use around 10% of the original training schedule, starting at 1% of the initial training learning rate, and a cosine annealing learning rate schedule that follows the decreasing half of a cosine period, down to 1% of the initial fine tuning learning rate (0.01% of the initial training learning rate).

Some recommendations:

Quantization Aware Training (Essentially a discrete numerical optimization problem) is not a solved problem mathematically. Based on our experience, here are some recommendations:
For STE approximation to work well, it is better to use small learning rate. Large learning rate is more likely to enlarge the variance introduced by STE approximation and destroy the trained network.
Do not change quantization representation (scale) during training, at least not too frequently. Changing scale every step, it is effectively like changing data format (e8m7, e5m10, e3m4, et.al) every step, which will easily affect convergence.

参考：链接1，链接2，链接3，
链接4