【mnn】——模型离线量化流程代码浅析

最新推荐文章于 2024-06-17 14:09:59 发布

农夫山泉2号

最新推荐文章于 2024-06-17 14:09:59 发布

阅读量1.3k

点赞数 2

分类专栏：嵌入式AI 文章标签： mnn 深度学习机器学习

本文链接：https://blog.csdn.net/u011622208/article/details/122255982

版权

嵌入式AI 专栏收录该内容

157 篇文章 50 订阅

订阅专栏

mnn, 离线量化

1. 前言

mnn的离线量化，需要首先将其他模型转换成mnn的模型表达，再进行量化。

这里我们采用MAX_ABS进行weight权重量化，KL散度进行激活值的量化，int8对称量化。

2. Code

2.1 mnn模型读入与解析

std::unique_ptr<MNN::NetT> netT;
    {
        std::ifstream input(modelFile);
        std::ostringstream outputOs;
        outputOs << input.rdbuf();
        netT = MNN::UnPackNet(outputOs.str().c_str());
    }

    // temp build net for inference
    flatbuffers::FlatBufferBuilder builder(1024);
    auto offset = MNN::Net::Pack(builder, netT.get());
    builder.Finish(offset);
    int size      = builder.GetSize();
    auto ocontent = builder.GetBufferPointer();

    // model buffer for creating mnn Interpreter
    std::unique_ptr<uint8_t> modelForInference(new uint8_t[size]);
    memcpy(modelForInference.get(), ocontent, size);

    std::unique_ptr<uint8_t> modelOriginal(new uint8_t[size]);
    memcpy(modelOriginal.get(), ocontent, size);

    netT.reset();
    netT = MNN::UnPackNet(modelOriginal.get());

2.2 创建Calibration数据dataloader
这个Calibration是整个量化的主流程，整体流程可以归纳为：

fake quant weight，对原有的模型进行假量化，就是将模型的权重用MAX_ABS量化到int8，再从int8反量化到float类型。这里是为了统计的激活值的范围更精确。
将假量化模型和浮点模型的tensor放入两个不同的map

2.3 离线量化
整体流程：

给假量化模型传入图片，更新每个tensor的最大值，最小值
计算所有tensor的分布，将激活值用2048个bin进行离散，统计其直方图
通过KL散度为每个tensor计算一个阈值threshold，并将阈值转换为浮点和int8之间转换的scale
将量化参数，tensor scale，int8 weight等量化参数写回模型。注意这里需要将浮点的weight clear掉。

for (const auto iter :  _scales) {
        std::unique_ptr<MNN::TensorDescribeT> describe(new MNN::TensorDescribeT);
        describe->index = _tensorIdx[iter.first];
        describe->quantInfo.reset(new MNN::TensorQuantInfoT);
        describe->quantInfo->scale = iter.second;
        describe->quantInfo->type = MNN::DataType_DT_INT8;
        describe->quantInfo->min = -1 * _featureClampValue;
        describe->quantInfo->max = 1 * _featureClampValue;
        _originalModel->extraTensorDescribe.emplace_back(std::move(describe));          // 1. extraTensorDescribe量化后添加的属性，在哪里使用？？
    }

SymmetricQuantizeWeight(param->weight.data(), weightSize, quantizedWeight.data(), quantizedWeightScale.data(), outputChannel, _weightClampValue);

param->quanParameter = IDSTEncoder::encode(param->weight, quantizedWeightScale, weightSize/channles, channles, false, quantizedWeight.data(), -_weightClampValue);          // 3. 
param->quanParameter->scaleIn = inputScale;
param->quanParameter->scaleOut = outputScale;
if (param->common->relu6) {
    param->common->relu  = true;
    param->common->relu6 = false;
}
param->weight.clear();          // 4. 清除原有的weight

重新写回到模型

总结

整个代码还是很清晰的，结构也很明确。只是后续要如何使用这些量化参数需要深入到mnn的框架里，后面再补上mnn框架代码的浅析

农夫山泉2号

关注

2
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
【mnn】——模型离线量化流程代码浅析

mnn, 离线量化1. 前言mnn的离线量化，需要首先将其他模型转换成mnn的模型表达，再进行量化。这里我们采用MAX_ABS进行weight权重量化，KL散度进行激活值的量化，int8对称量化。2. Code2.1 mnn模型读入与解析std::unique_ptr<MNN::NetT> netT; { std::ifstream input(modelFile); std::ostringstream outputOs; .
复制链接

扫一扫