Tensorflow Model Optimization (TF模型优化)

最新推荐文章于 2024-03-25 09:55:40 发布

ssmixi

最新推荐文章于 2024-03-25 09:55:40 发布

阅读量987

点赞数

分类专栏：深度学习_Tensorflow学习系列

本文链接：https://blog.csdn.net/ssmixi/article/details/119001236

版权

深度学习_Tensorflow学习系列专栏收录该内容

9 篇文章 0 订阅

订阅专栏

TF模型优化

一、TF模型优化概述

1. 为什么要优化？

Reducing latency and cost for inference for both cloud and edge devices (e.g. mobile, IoT).
Deploying models on edge devices with restrictions on processing, memory and/or power-consumption.
Reducing payload size for over-the-air model updates.
Enabling execution on hardware restricted-to or optimized-for fixed-point operations.
Optimizing models for special purpose hardware accelerators.

2. 主要方法

1）Quantization

Quantized models are those where we represent the models with lower precision, such as 8-bit integers as opposed to 32-bit float. Lower precision is a requirement to leverage certain hardware.

2) Sparsity and pruning

Sparse models are those where connections in between operators (i.e. neural network layers) have been pruned, introducing zeros to the parameter tensors.

3) Clustering

Clustered models are those where the original model’s parameters are replaced with a smaller number of unique values.

4) Collaborative optimization

This enables you to benefit from combining several model compression techniques and simultaneously achieve improved accuracy through quantization aware training.

二、Weight Pruning

1. Trim insignificant weights (修剪无关紧要的权重)

理论论文：To prune, or not to prune: exploring the efficacy ofpruning for model compression

权重修剪在训练过程中逐渐将模型权重归零，以实现模型稀疏性。稀疏模型更容易压缩，我们可以在推理过程中跳过零值以改善延迟。

方法

三、Quantization

1. Quantization aware training

理论论文：uantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
使用方法：方法

2. Post-training quantization

四、Weight Clustering

Clustering, or weight sharing, reduces the number of unique weight values in a model, leading to benefits for deployment. It first groups the weights of each layer into N clusters, then shares the cluster’s centroid value for all the weights belonging to the cluster.

This technique brings improvements via model compression. Future framework support can unlock memory footprint improvements that can make a crucial difference for deploying deep learning models on embedded systems with limited resources.

理论论文：Deep Compression: Compressing Deep Neural Networks With Pruning, Trained Quantization and Huffman Coding
使用方法：方法

五、Collaborative Optimization

Collaborative optimization is an overarching process that encompasses various techniques to produce a model that, at deployment, exhibits the best balance of target characteristics such as inference speed, model size and accuracy.

The idea of collaborative optimizations is to build on individual techniques by applying them one after another to achieve the accumulated optimization effect. Various combinations of the following optimizations are possible:

![在这里插入图片描述](https://img-blog.csdnimg.cn/img_convert/9aef1330e0e87d32226bb378c1339bfc.png![在这里插入图片描述](https://img-blog.csdnimg.cn/img_convert/3a1d2e9eb112b835b8f850ab8680636d.png- 使用方法：方法

ssmixi

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Tensorflow Model Optimization (TF模型优化)

TF模型优化一、TF模型优化概述1. 为什么要优化？2. 主要方法1）Quantization2) Sparsity and pruning3) Clustering4) Collaborative optimization二、Weight Pruning1. Trim insignificant weights (修剪无关紧要的权重)三、Quantization1. Quantization aware training2. Post-training quantization四、Weight Clust
复制链接

扫一扫