Onnx 插入在Conv和ConvTranspose前自动插入Q/DQ算子

数字游民在放牛

已于 2025-01-10 16:57:17 修改

阅读量458

点赞数 5

文章标签： python 开发语言

于 2024-10-31 19:57:07 首次发布

本文链接：https://blog.csdn.net/niudaniuworking/article/details/143415753

版权

常见问题：

1. weight中的scale Tensor 提示是empty，原因缺少init

graph.initializer.append(x_scale)  # x_scale为node中的tensor

2. 插入node节点之后进行model check的时候提示node没有这个属性例如：

"onnx.onnx_cpp2py_export.checker.ValidationError: Unrecognized attribute: axis for operator QuantizeLinear"

原因：当前onnx的optset版本太低或者太高了，应该先到onnx查看当前想插入的axis在optset哪个版本中存在，然后将onnx转换为你需要的版本后再插入对应的属性，转换方法：参考链接

import onnx
from onnx import version_converter, helper
 
# Preprocessing: load the model to be converted.
model_path = "path/to/the/model.onnx"
original_model = onnx.load(model_path)
 
print(f"The model before conversion:\n{original_model}")
 
# A full list of supported adapters can be found here:
# https://github.com/onnx/onnx/blob/main/onnx/version_converter.py#L21
# Apply the version conversion on the original model
converted_model = version_converter.convert_version(original_model, <int target_version>)
 
print(f"The model after conversion:\n{converted_model}")

3. 模型运行时提示

"oid onnxruntime::PrepareForQDQ(const TensorShape&, const Tensor&, const Tensor*, int64_t, int64_t, int64_t&, int64_t&, int64_t&) scale.Shape().NumDimensions() == 1 && scale.Shape()[0] == broadcast_dim was false. For per axis quantization, scale must be 1D tensor with size 256"

原因： Q/DQ中默认的axis=1，需要调整axis为你的Channle维度

4.Q/DQ 自动添加脚本

def insert_weight_qdq(model, quant_type, graph, i, node_index, input_name, dim):
    quant_output_name = quant_type +"_"+ node_index +"_weight"
    tensor = np.random.rand(dim)
    x_scale = helper.make_tensor(
        name="x_scale"+quant_output_name,
        data_type=onnx.TensorProto.FLOAT,
        dims=[dim],
        vals=tensor
    )
    x_zero_point = helper.make_tensor(
        name="x_zero_point"+quant_output_name,
        data_type=onnx.TensorProto.INT8,
        dims=[dim],
        vals=tensor.astype(np.int8)
    )
    graph.initializer.append(x_scale) 
    graph.initializer.append(x_zero_point) 
    #attr = onnx.helper.make_attribute("axis", 0)
    quant_input_node = helper.make_node(
        quant_type,
        inputs=[input_name, "x_scale"+quant_output_name, "x_zero_point"+quant_output_name],
        outputs=[quant_output_name],
        name=quant_output_name+"_output",
    )
    #quant_input_node.attribute.insert(0,attr)
    graph.node.insert(i, quant_input_node)
    return quant_output_name

attention:

https://chromewebstore.google.com/detail/redium/aapiedkipcbeplicbbicchmdmpinhjdl?pli=1

https://medium.com/@DeeperAndCheaper/quantization-yolov8-qat-x2-speed-up-on-your-jetson-orin-nano-2-how-to-achieve-the-best-qat-c6069fb83ab7