DPU-PYNQ使用笔记

最新推荐文章于 2025-03-31 15:30:30 发布

XS30

最新推荐文章于 2025-03-31 15:30:30 发布

阅读量4.4k

点赞数 5

分类专栏： FPGA 文章标签： Vitis AI DPU FPGA 量化

本文链接：https://blog.csdn.net/u014798590/article/details/121281335

版权

本文详细记录了如何在Ubuntu环境下使用Vitis AI对模型进行量化和编译，以及在PYNQ平台上进行DPU推理的过程。量化过程包括模型结构查看、校准数据集的选择，Pytorch和Tensorflow的量化代码示例。编译阶段涉及不同框架的编译方法，如Tensorflow和Pytorch的sh文件。在PYNQ端，无需依赖原始深度学习框架，直接利用xmodel进行推理，并展示了MNIST模型的运行效果和性能指标。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Ubuntu端

Tips：
1、在Ubuntu端运行Vitis AI时，最好将自己工程文件夹，放到Vitis AI 文件下级目录下，而后右键在当前路径打开终端
输入./docker_run.sh xilinx/vitis-ai
2、在训练Pytorch模型时，Vitis AI所支持的Pytorch版本是<=1.4，框架内部所安装的为1.4版本。
在这里插入图片描述
3、在打包Pytorch模型时(大于1.7版本)，需要注意的是Pytorch高版本所生成的pth文件，为一压缩文件，如果需要得到低版本所保存的pth文件，需要在模型保存代码中加入torch.save(model.state_dict(), model_cp,_use_new_zipfile_serialization=False，最后获得的文件在Ubuntu中如下图所示，否则在量化过程中或出现版本过高的提示。关于Pytorch模型打包的几种方式，可以参考该文章
在这里插入图片描述

再根据的的模型框架，选择对应的版本。

查看模型的结构

可以使用Ubuntu中的Netron软件查看模型结构（Summary）
在这里插入图片描述

量化（quantizing)

官方给出了量化的工作流程图，输入模型为浮点模型，其中预处理环节主要工作为折叠和删除无用节点，然后将模型的权重/偏差与激活（weights/biases and activations）量化到给定的位宽。

To capture activation statistics and improve the accuracy of quantized models, the Vitis AI quantizer must run several iterations of inference to calibrate the activations. A calibration image dataset input is, therefore, required. Generally, the quantizer works well with 100–1000 calibration images. Because there is no need for back propagation, the un-labeled dataset is sufficient.

在量化过程中，为了提高最终模型的准确性，Vitis AI量化器必须运行多次推理迭代来校准激活，所以需要图像数据集输入，同时，由于不需要反向传播，所以未标记的数据集就行。对于量化的原理，可以参考这篇文章进行学习。
在这里插入图片描述而过程中不需要数据集label的原因，可以通过下图理解。即校准过程中，需要用dataset来获取的min和 max，来求取S与Z

在进行量化时，需要添加训练得到的浮点数模型与校准数据集（100-1000张）
在Tensorflow中采用的是如下代码，由下代码可看出，其遵循上述流程图步骤。

from tensorflow_model_optimization.quantization.keras import vitis_quantize

float_model = tf.keras.models.load_model(‘float_model.h5’)
quantizer = vitis_quantize.VitisQuantizer(float_model)

#load datasets start
dir=r'\xilinx\jupyter_notebooks\Image\1'
imgs=np.ones(shape = [422,224,224,3], dtype=int)
for i in range(422):
	path = os.path.join(dir, str(i)+'.jpg')
	img=cv2.imread(path)
	# print("shape of img"+str(i)+'.jpg is: '+str(img.shape))
	# print(imgs[i,:,:,:].shape)
	imgs[i,:,:,:]=img
test_img = imgs/255
#load datasets end

quantized_model = quantizer.quantize_model(calib_dataset=test_img)   
quantized_model.save(os.path.join(args.output, args.name+'.h5'))

Pytorch流程
而Pytorch中的代码流程类似，详见该网页
1、导入vai_q_pytorch模块

from pytorch_nndct.apis import torch_quantizer, dump_xmodel

2、生成一个量化输入所需的量化器，并得到转换后的模型

input = torch.randn([batch_size, 3, 224, 224])
    quantizer = torch_quantizer(quant_mode, model, (input))
    quant_model = quantizer.quant_model

3、用转换后的模型建立神经网络。

acc1_gen, acc5_gen, loss_gen = evaluate(quant_model, val_loader, loss_fn)

4、输出量化结果并部署模型。

if quant_mode == 'calib':
  quantizer.export_quant_config()
if deploy:
  quantizer.export_xmodel())

在GitHub的例程中，是如下使用的，与TF2的流程差异不大。
在这里插入图片描述

from pytorch_nndct.apis import torch_quantizer, dump_xmodel
from common import *

def quantize(build_dir,quant_mode,batchsize):

  dset_dir = build_dir + '/dataset'
  float_model = build_dir + '/float_model'
  quant_model = build_dir + '/quant_model'


  # use GPU if available   
  if (torch.cuda.device_count() > 0):
    print('You have',torch.cuda.device_count(),'CUDA devices available')
    for i in range(torch.cuda.device_count()):
      print(' Device',str(i),': ',torch.cuda.get_device_name(i))
    print('Selecting device 0..')
    device = torch.device('cuda:0')
  else:
    print('No CUDA devices available..selecting CPU')
    device = torch.device('cpu'

最低0.47元/天解锁文章