keras模型转换为onnx模型推理，记录实现过程

weixin_49525852

已于 2022-02-17 15:49:21 修改

阅读量2.1k

点赞数

文章标签： keras pytorch 深度学习

于 2022-02-17 13:08:39 首次发布

本文链接：https://blog.csdn.net/weixin_49525852/article/details/122978233

版权

之前成功将pytorch下的bert模型转换为onnx及tensorrt框架下的模型进行推理，特别在在tensor框架下，推理速度估算是提高了3-4倍（估算，跟显卡及设置的batch_size大小有关），参考的是这篇文章：https://blog.csdn.net/HUSTHY/article/details/118444462https://blog.csdn.net/HUSTHY/article/details/118444462 下面记录将keras模型转换为onnx模型。由于自己水平有限，以下内容仅为自己的理解并记录爬坑过程，肯定有更佳的方案。

1、将keras下的预训练模型加载后，进行推理时要用model.predict(x，y)的形式，想直接改成model(x,y)。自己首先进行了源码改写，出现错误：

RuntimeError: Detected a call to `Model.predict` inside a `tf.function`. 
`Model.predict is a high-level endpoint that manages its own `tf.function`. 
Please move the call to `Model.predict` outside of all enclosing `tf.function`s. 
Note that you can call a `Model` directly on `Tensor`s inside a `tf.function` like: `model(x)`.

后面查资料才知道需要用 tf.function 将方法包装起来起来，参考的做法：

import numpy as np
import tensorflow as tf

def build_model():
    from tensorflow.keras.layers import Input, Multiply
    from tensorflow.keras.models import Model
    
    inputs = x = Input(shape=(4, 4, 3), name='image')
    x = Multiply()([x, x])

    model = Model(inputs, x)
    return model

# Create an array of all two's
_input = np.full(shape=(1, 4, 4, 3), fill_value=2, dtype='float32')

_model = build_model()
print(_model.predict(_input))
# correctly prints [[[[4. 4. 4.] ....

# wrap model in a concrete function
@tf.function
def model_func(inputs):
    return _model(inputs)

print(model_func(_input))

2、keras模型转换为onnx模型

参考代码

tensorflow-onnx/getting_started.py at master · onnx/tensorflow-onnx (github.com)https://github.com/onnx/tensorflow-onnx/blob/master/examples/getting_started.py 我的转换代码

output_model_path = 'keras_onnx_cpu.onnx'

input_signature = [tf.TensorSpec([None, None], tf.float32), tf.TensorSpec([None, None], tf.float32)]

onnx_model, _ = tf2onnx.convert.from_function(model, input_signature, opset=13, output_path=output_model_path)

3、加载onnx模型并进行推理

参考上面github中的加载代码会报错

sess = ort.InferenceSession(onnx_model.SerializeToString())

我用下面的代码

sess = ort.InferenceSession(output_model_path)
res = sess.run(None, {'token_ids': token_ids.astype('float32'), 'segment_ids':segment_ids.astype('float32')})
print(res[0])

此时如果报错

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: Unknown model file format version.

在网上找的解决方案是：应该是onnx和onnxruntime的版本不匹配导致，可以安装onnx==1.8.0，onnxruntime==1.10.0，然后重新转换模型，再推理。我的换成onnx==1.8.0，onnxruntime==1.6.0后，加载成功。

4、关于GPU加速

在之前将prtorch模型转为onnx模型，用的是onnxruntime-gup(1.2.0)版，转换后的模型可用GPU加速。

本次转换需要 opset>=13，onnxruntime-gup的1.2.0版本太低，而我服务器的CUDA和cuDNN不支持高本版的onnxruntime-gup，所以只能用onnxruntime进行cpu推理。python onnx 快捷安装 onnxruntime 的 gpu 版本如何使用_x1131230123的博客-CSDN博客_onnx python 安装https://blog.csdn.net/x1131230123/article/details/120422132 更新内容：在高版本GUDA（11.1）环境上安装了onnx==1.10.1，onnxruntime-gpu==1.8.1，用GPU加速onnx推理，发现速度提升更多。

5、推理速度比较

用自己的数据在服务器(显存)环境下跑，cpu下onnx模型的速度比gpu下原keras模型的速度快2-3倍，gpu下onnx模型的速度比gpu下原keras模型的速度快6倍左右。

6、未来可能的做法

如果后面有需要，进一步将onnx模型转换为tensortr模型，应该能再次提高推理速度。