DPU on PYNQ-Z2系列—2.2 DNNDK使用—使用decent工具量化模型

使用decent工具量化模型

本篇博文以dnndk提供的resnet50模型为例介绍如何使用decent工具对模型进行量化

TensorFlow

1.冻结模型

freeze_graph \
--input_graph=./float_graph/resnet50v1.pb \
--input_checkpoint=./float_graph/resnet50v1.ckpt \
--input_binary=true \
--output_graph=./resnet50v1.pb \
--output_node_name=resnet_v1_50/predictions/Reshape_1

这里的pb和ckpt文件是dnndk提供的,input_node和output_node的名称是在定义模型时确定的,如果是自定义模型要根据定义修改。

2.验证冻结后的模型

evaluate_frozen_graph.sh内容如下:

#!/bin/sh
set -e
# Please set your imagenet validation dataset path here,
IMAGE_DIR=/media/DATASET/imagenet2012/val/
IMAGE_LIST=/media/DATASET/imagenet2012/val.txt

EVAL_BATCHES=1000
BATCH_SIZE=50

python3 eval.py \
  --input_frozen_graph ./frozen_resnet50v1.pb \
  --input_node input \
  --output_node resnet_v1_50/predictions/Reshape_1
  --eval_batches $EVAL_BATCHES \
  --batch_size $BATCH_SIZE \
  --eval_image_dir $IMAGE_DIR \
  --eval_image_list $IMAGE_LIST \
  --gpu 0

在这里,用到了eval.py,主要内容如下

def eval(input_graph_def, input_node, output_node):
    """Evaluate classification network graph_def's accuracy, need evaluation dataset"""
    tf.import_graph_def(input_graph_def,name = '')

    # Get input tensors
    input_tensor = tf.get_default_graph().get_tensor_by_name(input_node+':0')
    input_labels = tf.placeholder(tf.float32,shape = [None,FLAGS.class_num])

    # Calculate accuracy
    output = tf.get_default_graph().get_tensor_by_name(output_node+':0')
    prediction = tf.reshape(output, [FLAGS.batch_size, FLAGS.class_num])
    correct_labels = tf.argmax(input_labels, 1)
    top1_prediction = tf.nn.in_top_k(prediction, correct_labels, k = 1)
    top5_prediction = tf.nn.in_top_k(prediction, correct_labels, k = 5)
    top1_accuracy = tf.reduce_mean(tf.cast(top1_prediction,'float'))
    top5_accuracy = tf.reduce_mean(tf.cast(top5_prediction,'float'))

    # Start evaluation
    print("Start Evaluation for {} Batches...".format(FLAGS.eval_batches))
    with tf.Session() as sess:
        progress = ProgressBar()
        top1_sum_acc = 0
        top5_sum_acc = 0
        for iter in progress(range(0,FLAGS.eval_batches)):
            input_data = eval_input(iter, FLAGS.eval_image_dir, FLAGS.eval_image_list, FLAGS.class_num, FLAGS.batch_size)
            images = input_data['input']
            # img = input_data['input']
            # images = np.array(img)
            labels = input_data['labels']
            feed_dict = {input_tensor: images, input_labels: labels}
            top1_acc, top5_acc = sess.run([top1_accuracy, top5_accuracy],feed_dict)
            top1_sum_acc += top1_acc
            top5_sum_acc += top5_acc
    final_top1_acc = top1_sum_acc/FLAGS.eval_batches
    final_top5_acc = top5_sum_acc/FLAGS.eval_batches
    print("Accuracy: Top1: {}, Top5: {}".format(final_top1_acc, final_top5_acc))

其中大部分内容都是固定下来无需做任何改动,只有
input_data = eval_input(iter, FLAGS.eval_image_dir, FLAGS.eval_image_list, FLAGS.class_num, FLAGS.batch_size)
这一行需要改动。这一行的作用是把图片以及相应的label信息读取进来,经过图像预处理成Tensor,并且返回相应的label。eval_input内容如下

def eval_input(iter, eval_image_dir, eval_image_list, class_num, eval_batch_size):
    images = []
    labels = []
    line = open(eval_image_list).readlines()
    for index in range(0, eval_batch_size):
        curline = line[iter * eval_batch_size + index]
        [image_name, label_id] = curline.split(' ')
        image = cv2.imread(eval_image_dir + image_name)
        image = preprocess(image)
        images.append(image)
        labels.append(int(label_id))
        lb = preprocessing.LabelBinarizer()
    lb.fit(range(0, class_num))
    labels = lb.transform(labels)
    return {"input": images, "labels": labels}

这里边最关键的内容是preprocess(image)
我们在做模型验证、量化时要保证这里用的图像预处理与模型训练时用的预处理是一致的
我们在做模型验证、量化时要保证这里用的图像预处理与模型训练时用的预处理是一致的
我们在做模型验证、量化时要保证这里用的图像预处理与模型训练时用的预处理是一致的
重要的事情说三遍。从dnndk提供的脚本来看resnet50v1的预处理是对RGB三个通道分别减去103.939,117.779,123.68,并且将RGB转换成BGR。于是在eval_input里的preprocess函数也应该执行相同的操作,内容如下:

def preprocess(img):
    img = np.array(img, dtype=np.float32)
    height, width, _ = img.shape
    new_height = height * 256 // min(img.shape[:2])
    new_width = width * 256 // min(img.shape[:2])
    img = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_CUBIC)

    height, width, _ = img.shape
    startx = width//2 - (224//2)
    starty = height//2 - (224//2)
    img = img[starty:starty+224,startx:startx+224]
    assert img.shape[0] == 224 and img.shape[1] == 224, (img.shape, height, width)

    img[:,:,0] -= 123.68
    img[:,:,1] -= 116.779
    img[:,:,2] -= 103.939 
    
    # Resize
    return img

需要注意两点

  • cv2.imread读取进来的直接是BGR格式,无需再做RGB2BGR
  • cv2.imread读取进来的图像数据格式是uint8,需要转换成float32再减去均值

然后执行evaluate_frozen.sh结果如下,准确率Top1: 0.7355, Top5: 0.9147,说明我们的预处理是没有问题的

root@3f231e40c7cd:/mnt/nvidia/host_x86/models/tensorflow/resnet50# sh evaluate_frozen.sh
WARNING:tensorflow:From eval.py:55: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
Start Evaluation for 1000 Batches...
2020-03-08 17:18:32.296229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:04:00.0
totalMemory: 10.76GiB freeMemory: 10.60GiB
2020-03-08 17:18:32.296285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2020-03-08 17:18:32.668404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-08 17:18:32.668467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2020-03-08 17:18:32.668478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2020-03-08 17:18:32.668634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10232 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:04:00.0, compute capability: 7.5)
100% |###################################################################################################################################################################################################|
Accuracy: Top1: 0.6144000029563904, Top5: 0.8405999964475632

3.量化模型

decent_q quantize \
  --input_frozen_graph ./frozen_resnet50v1.pb \
  --input_nodes input \
  --input_shapes ?,224,224,3 \
  --output_nodes resnet_v1_50/predictions/Reshape_1 \
  --input_fn input_fn.calib_input \
  --method 1 \
  --gpu 0 \
  --calib_iter 27 \
  --output_dir ./quantize_results \
  --weight_bit 8 \
  --activation_bit 8

4.编译模型

  • 准备dcf文件
    在Vivado中集成DPU IP一节中我们提到要保存hwh文件,在dnndk中调用dlet生成dcf文件。
dlet -f pynq_dpu.hwh
[DLet]Generate DPU DCF file dpu-11111530-111530-201911111530-1530-30.dcf successfully.
  • 编译
dnnc --parser=tensorflow                         \
       --frozen_pb=./quantize_results/deploy_model.pb   \
       --output_dir=dnnc_output                 \
       --dcf=pynqz2.dcf                         \
       --mode=normal                        \
       --cpu_arch=arm32                     \
       --net_name=resnet50v1

等待一段时间我们可以看到下面的结果

[DNNC][Warning] layer [resnet_v1_50_SpatialSqueeze] (type: Squeeze) is not supported in DPU, deploy it in CPU instead.
[DNNC][Warning] layer [resnet_v1_50_predictions_Softmax] (type: Softmax) is not supported in DPU, deploy it in CPU instead.

DNNC Kernel topology "resnet50v1_kernel_graph.jpg" for network "resnet50v1"
DNNC kernel list info for network "resnet50v1"
                               Kernel ID : Name
                                       0 : resnet50v1_0
                                       1 : resnet50v1_1

                             Kernel Name : resnet50v1_0
--------------------------------------------------------------------------------
                             Kernel Type : DPUKernel
                               Code Size : 0.99MB
                              Param Size : 24.35MB
                           Workload MACs : 6964.51MOPS
                         IO Memory Space : 2.25MB
                              Mean Value : 0, 0, 0,
                              Node Count : 58
                            Tensor Count : 59
                    Input Node(s)(H*W*C)
            resnet_v1_50_conv1_Conv2D(0) : 224*224*3
                   Output Node(s)(H*W*C)
           resnet_v1_50_logits_Conv2D(0) : 1*1*1000


                             Kernel Name : resnet50v1_1
--------------------------------------------------------------------------------
                             Kernel Type : CPUKernel
                    Input Node(s)(H*W*C)
             resnet_v1_50_SpatialSqueeze : 1*1*1000
                   Output Node(s)(H*W*C)
        resnet_v1_50_predictions_Softmax : 1*1*1000

需要解释一下为什么产生了两个kernel,却只生成了一个elf文件。在ResNet50v1网络中,从输入resnet_v1_50_conv1_Conv2D到resnet_v1_50_logits_Conv2D节点,都是放在dpu上计算的,但是后边的squeeze和softmax操作dpu不支持,就需要我们把数据从resnet_v1_50_logits_Conv2D节点拿出来再手动写squeeze和softmax的功能。不过我们在这里做的只是分类,并不需要把softmax结果计算出来,让dpu计算到resnet_v1_50_logits_Conv2D,对结果直接排序就可以得到分类的结果了。

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 7
    评论
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值