DenseNet和Yolo模型在ultra96开发板上的实现

本文链接：https://blog.csdn.net/u010879745/article/details/109399156

DenseNet和Yolo模型在ultra96开发板上的实现

CSDN又不能直接上传DOC文档，本人没时间详细整理，请大家见谅！

前言
FPGA在人工智能的功效比比GPU高得多，但由于其复杂性和难以移植，阻碍了其发展，我们在此项目中成功地在ultra96开发板上移植了新模型DenseNet和Yolov3，我们用cifar10数据集训练了DenseNet模型，并成功对其量化编译，同时将训练好的Yolov3 Darknet模型也成功转化成了Caffe模型，也成功对其进行了量化和编译，在此项目中详述了整个过程，以探索和推动FPGA AI技术对新模型的转换和引入。

一、项目软硬件环境
1.1 Ultra96v2开发板
在这里插入图片描述
1.2 硬件系统组成
本实验是由Ultra96独立搭建的系统，需要用到摄像头、显示器、mini DP转换器和鼠标键盘。Ultra96开发板连接如下图：

1.3 软件系统组成
本项目的训练、量化和交叉编译全部是在双系统计算机的ubuntu18.04的GPU环境下，安装了vitis ai和xilinx GPU镜像，使用python语言、tensorflow 1.15.2和keras 2.2.4.

二、DenseNet模型及其改动
DenseNet是近年来新出现的模型，它建立的是前面所有层与后面层的密集连接（dense connection），DenseNet在参数和计算成本更少的情形下实现比ResNet更优的性能，DenseNet也因此斩获CVPR 2017的最佳论文奖。
DPU是特有的Xilinx神经网络模型结构，DenseNet的模型由于一些层定义不为DPU所接受，并不能直接量化，而必须做以下的更改。
2.1 优化器更换
采用RMSProp以代替 Stochastic Gradient Descent (SGD)，train.py程序中注释掉SGD
‘’’
Optimizer
RMSprop used in this example.
SGD with Nesterov momentum was used in original paper
‘’’
#opt = SGD(lr=learnrate, momentum=0.9, nesterov=True)
opt = RMSprop(lr=learnrate)
2.2 层顺序改变
鉴于DPU的要求，我们将BatchNorm, RelU 激活层和卷积层的顺序从 BN->ReLU->Conv转换到Conv->BN->ReLU.

2.3 设置第一层卷积参数
原始论文中是针对Imagenet数据集，第一层卷积是采用7x7 kernel, 步长为2和最大池化层:
# Use this for IMAGENET
# first convolutional layer + BN + ReLU
net = Conv2D((2*k), 7, strides=2, use_bias=False)(input_layer)
net = BatchNormalization()(net)
net = Activation(‘relu’)(net)
# max pooling layer
net = MaxPooling2D(3, 2)(net)

而在我们的DenseNetX.py的程序是针对CIFAR-10数据集，因此第一层卷积参数设置为3x3 kernel，stride步长 1，忽略池化层。

# Use this for CIFAR-10, CIFAR-100
# first convolutional layer + BN + ReLU (Imagenet style)
net = Conv2D((2*k),3,strides=1,use_bias=False,kernel_initializer='he_uniform',kernel_regularizer=l2(weight_decay),padding='same')(input_layer)
net = BatchNormalization()(net)

net = Activation(‘relu’)(net)

2.4 改变GlobalAveragePooling2D层
DenseNet有GlobalAveragePooling2D，而DPU没有GlobalAveragePooling2D，我们用AveragePooling2D + Flatten来代替 GlobalAveragePooling2D。

# pool size = input feature map size and set strides 
# stride = input feature map width
h = K.int_shape(net)[1]
w = K.int_shape(net)[2]
net = AveragePooling2D((h,w), strides=w, padding='same')(net)
net = Flatten()(net)
net = Dense(classes, kernel_initializer='he_normal')(net)
output_layer = Activation('softmax')(net)

2.5 H5浮点模型TOP定义
model.summary()

三、DenseNet模型训练

3.1 项目目录

3.2 设置训练环境与参数
source ./0_setenv.sh
此脚本设置必要的目录和参数，如学习率、输入分辨率，信道和批次等。

export EPOCHS=160
第一次训练宜采用较高的EPOCHS=160，以获得较高的准确度。
export BATCHSIZE=150
这个参数直接取决于你的计算机的内存大小。
以后的训练中可采用先前训练好的模型，我们改进了训练程序，使其能够保存和继续训练。

3.3 训练数据集下载与安装
我们使用cifar10数据集，因为如果我们使用imagenet数据集，仅数据集就需要200G空间和一次训练数周时间，作为学生无法承受，因此我们选用较小图片集cifar10,一次训练两天时间，比较合适。

下载下来的文件在docker里当前工作目录下
john@john-wang:/workspace/$
下载数据集语句：
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
下载过程：

将文件名改名为cifar-10-batches-py.tar.gz
john@john-wang:/workspace/ $s u d o m v c i f a r - 10 - p y t h o n . t a r . g z c i f a r - 10 - b a t c h e s - p y . t a r . g z 解压一下： j o h n @ j o h n w a n g : / w o r k s p a c e$ sudo tar -zxvf ./cifar-10-batches-py.tar.gz
创建目录
john@john-wang:/workspace$ mkdir -p ~/.keras/datasets
放到~/.keras/datasets/ 目录下
john@john-wang:/workspace $s u d o c p - r c i f a r - 10 - b a t c h e s - p y / . k e r a s / d a t a s e t s (v i t i s - a i - t e n s o r f l o w) j o h n @ j o h n - w a n g : / w o r k s p a c e$ ll ~/.keras/datasets
total 166520
drwxr-xr-x 3 john vitis-ai-users 4096 Aug 1 19:57 ./
drwxr-xr-x 3 john vitis-ai-users 4096 Aug 1 19:11 …/
drwxr-xr-x 2 john vitis-ai-users 4096 Jun 4 2009 cifar-10-batches-py/
-rw-r–r-- 1 john vitis-ai-users 170498071 Aug 1 19:57 cifar-10-batches-py.tar.gz

3.4 训练程序
3.4.1 设置callback训练参数
保留每一批次的训练模型：
chkpt_call=ModelCheckpoint(filepath=keras_hdf5+“epoch.{epoch:03d}.val_acc.{val_acc:.2f}.h5”, monitor=‘val_acc’,verbose=1,save_best_only=True)

tb_call=TensorBoard(log_dir=tboard,batch_size=batchsize,update_freq=‘epoch’)

lr_scheduler_call=LearningRateScheduler(schedule=step_decay, verbose=1)

lr_plateau_call=ReduceLROnPlateau(factor=np.sqrt(0.1),cooldown=0,patience=5, min_lr=0.5e-6)

callbacks_list = [tb_call, lr_scheduler_call, lr_plateau_call, chkpt_call]

载入固化模型k_model.h5时报错：
ValueError: (‘Unrecognized keyword arguments:’, dict_keys([‘ragged’]))
因为低版本的input不支持ragged参数，所以点一下最后报错的文件，进入到tf1.x的源码，屏蔽如下的代码，这个tensorflow1遇到的问题
#if kwargs:
#raise ValueError(‘Unrecognized keyword arguments:’,kwargs.keys())
3.4.2 调用保存模型继续训练
这段程序是我们编的用于调用保存的最新模型继续训练的程序，同时考虑了继续训练的初始epoch，以便程序设置正确的学习率。因为训练非常耗时，也需要随时观察指标，避免有意或无意中断时被迫从头训练模型。
model_path=keras_hdf5
listfile = [i for i in os.listdir(model_path) if i.endswith(“h5”)]
print(“listfile = {}”.format(listfile))
#模型排序
listfile.sort()
#模型倒序
listfile=listfile[::-1]
print(“listfile = {}”.format(listfile))
if len(listfile) != 0:
#取出最新的保存模型
model_path = model_path + listfile[0]
model = load_model(model_path)

initial_epoch=int(listfile[0].split(".")[1])
print(“initial_epoch = %d”%int(initial_epoch))
else:
#如果没有保存模型，将调用densenetx子程序，生成新模型，并设置epoch为0。
model = densenetx(input_shape=(input_height,input_width,input_chan),classes=10,theta=0.5,drop_rate=0.2,k=12,convlayers=[16,16,16])
initial_epoch = 0

3.5 训练模型
3.5.1 训练过程
进入VITIS-AI GPU镜像

(vitis-ai-tensorflow) john@john-wang:/workspace/DenseNet3/trainrestore/trainrestore$ source setenv.sh
(vitis-ai-tensorflow) john@john-wang:/workspace/DenseNet3/trainrestore/trainrestore$ ./trainrestore.sh
注意第一条命令是source setenv.sh, 而不是./setenv.sh，因为./执行命令时创建子shell，所有变量赋值只在子shell中有效，在trainrestore.sh程序中无法使用。

开始训练：
batchsize根据我的计算机内存16G调整为50。

3.5.2 训练结果
经过大概两天两夜训练结束，调用tensorboard可以看到训练过程和结果。
john@john-wang:~/Vitis-AI_1.2/DenseNet3/trainrestore/trainrestore/build$ tensorboard --logdir=tb_logs
tb_logs为tensorboard存贮的目录，此命令在上一层目录执行，否则要求全目录。

3.6 Keras转tensorflow模型
Vitis AI工具不能直接接收Keras checkpoints，需要转换成TensorFlow兼容固化模型。分为两步来实现：

HDF5文件转化为TensorFlow checkpoint.
TensorFlow checkpoint转化为二进制protobuf格式下固化模型。
./2_keras2tf.sh
生成的’frozen_graph.pb’模型放置在./files/build/freeze文件夹下。

3.7 固化模型
固化模型程序运行：
./3_eval_frozen.sh

四、DenseNet模型量化与评估
4.1 制作量化模型
./4_quant.sh

4.2 量化模型层定义
4.2.1 PB模型输出程序
为了分析量化模型，我们编了一个程序来输出PB模型的各层定义。

john@john-wang:~/Vitis-AI_1.2/DenseNet3/build/freeze$ python3
getlayers.py frozen_graph.pb

使用的程序为getlayers.py

import os, sys
from tensorflow.python.platform import gfile
import tensorflow as tf
def get_all_layernames(pb_file_path):
    #get all layers name 
    sess = tf.Session()
    with gfile.FastGFile(pb_file_path, 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
        sess.graph.as_default()
        tf.import_graph_def(graph_def, name='')
        tensor_name_list = [tensor.name for tensor in
 tf.get_default_graph().as_graph_def().node]
        for tensor_name in tensor_name_list:
            print(tensor_name)
if __name__ == '__main__':
    if len(sys.argv) == 2:
        get_all_layernames(sys.argv[1])

4.2.2 PB模型TOP定义

可以看到与2.5节的浮点模型TOP输出层定义比较，量化模型的TOP输出层已去掉softmax层，连接层直接输出。
4.3 评估量化模型
./5_eval_quant.sh

五、DenseNet DPU制作与编译Kernel
5.1 DPU选型与设计
5.1.1 输入设计文件
DPU2304比DPU1600算力更强，ultra96也能接受，可以在300M DPU时钟工作,时钟更高则由于电源设计缺陷而呈现不稳定。

Prj_config文件

Ultra96.json文件

5.1.2 DPU制作脚本

5.1.3 Vivado原理图

5.1.4 DPU 2304性能

5.2 Kernel编译命令
BOARD=Ultra96
MODEL_NAME=tf_densenet_imagenet_30_30_7.7G
MODEL_UNZIP=${MODEL_NAME}
vai_c_tensorflow
–frozen_pb $KaTeX parse error: Undefined control sequence: \ at position 41: …eploy_model.pb \̲ ̲ --arch /opt/v…$ {BOARD}/ $KaTeX parse error: Undefined control sequence: \ at position 14: {BOARD}.json \̲ ̲ --output_dir …$ {MODEL}

5.3 DPU Kernel

六、DenseNet开发板文件
6.1 制作开发板文件库
./7_make_target.sh

6.2 开发板项目文件夹

6.3 开发板源文件
test.py调用utils.py，再调用runcf.py。
6.3.1 DPU主程序
def run(image_folder, shortsize, KERNEL_CONV, KERNEL_CONV_INPUT, KERNEL_FC_OUTPUT, inputscale):
start = time.time()
#目录夹图像列表
listimage = [i for i in os.listdir(image_folder) if i.endswith(“jpg”)]
#排序
listimage.sort()
fo = open(resultname, “w”)
n2cube.dpuOpen()
kernel = n2cube.dpuLoadKernel(KERNEL_CONV)
task = n2cube.dpuCreateTask(kernel, 0)
height, width, inputchannel, mean = parameter(task, KERNEL_CONV_INPUT)
outsize = n2cube.dpuGetOutputTensorSize(task, KERNEL_FC_OUTPUT)
outputchannel = n2cube.dpuGetOutputTensorChannel(task, KERNEL_FC_OUTPUT)
conf = n2cube.dpuGetOutputTensorAddress(task, KERNEL_FC_OUTPUT)
inputscale = n2cube.dpuGetInputTensorScale(task,KERNEL_CONV_INPUT)
outputscale = n2cube.dpuGetOutputTensorScale(task, KERNEL_FC_OUTPUT)
imagenumber = len(listimage)
#打印图片数
print("\nimagenumber = %d\n"%imagenumber)
softlist = []
correct = 0
wrong = 0
for i in range(imagenumber):
print(f"i = {i+1}")
print(listimage[i])
path = image_folder + listimage[i]
img = cv2.imread(path)
#图像预处理
imageRun = predict_label(img, task, inputscale, mean, height, width, inputchannel, shortsize, KERNEL_CONV_INPUT)
input_len = len(imageRun)
#DPU运行
softmax, listimage[i] = run_dpu_task(outsize, task, outputchannel, conf, outputscale, listimage[i], imageRun, KERNEL_CONV_INPUT, KERNEL_FC_OUTPUT)
correct, wrong = TopK(softmax, listimage[i], fo, correct, wrong)
fo.close()
accuracy = correct/imagenumber
print(‘Correct:’,correct,’ Wrong:’,wrong,’ Accuracy:’, accuracy)
n2cube.dpuDestroyTask(task)
n2cube.dpuDestroyKernel(kernel)
n2cube.dpuClose()
end = time.time()
total_time = end - start
print(’\nAll processing time: {} seconds.’.format(total_time))
print(’\n{} ms per frame\n’.format(10000*total_time/imagenumber))

6.3.2 计算准确率
def TopK(softmax, imagename, fo, correct, wrong):
for i in range(top):
#预测分类
num = np.argmax(softmax)
prediction = classes[num]
#提取金标文件
ground_truth, _ = imagename.split(’_’)
#每张图片预测写入文件
fo.write(imagename+’ p: ‘+prediction+’ g: ‘+ground_truth+’ : ‘+str(softmax[num])+’\n’)
if (ground_truth==prediction):
correct += 1
else:
wrong += 1
return correct, wrong

6.4 开发板elf文件

七、Ultra96运行DenseNet
运行test.py, 可以看到项目使用的是DPU2304，测试集10000个图片的准确度是91.5%, 在本程序中未优化速度。

八、Yolov3软件环境
第八章的软件环境为Ubuntu18.04 GPU Caffe环境，不在Vitis-AI镜像中操作。
8.1 项目文件安装

cd ~/Vitis-AI_1.2/yolov3
bash -v tutorial.sh
tutorial.sh对所有压缩文件执行解压缩。
$ tar -xvf caffe-master.tar.gz
$ tar -xvf darknet_origin.tar.gz
$ cat yolov3_deploy.tar.gz.part* > yolov3_deploy.tar.gz
$ tar -xvf yolov3_deploy.tar.gz
$ rm yolov3_deploy.tar.gz.part*
$ cd example_yolov3/5_file_for_test
$ tar -xvf calib_data.tar
$ cd …/…/

接着执行：
$ find . -type f -name “.txt" -print0 | xargs -0 dos2unix
$ find . -type f -name ".data” -print0 | xargs -0 dos2unix
$ find . -type f -name “.cfg" -print0 | xargs -0 dos2unix
$ find . -type f -name ".names” -print0 | xargs -0 dos2unix

find -print0表示在find的每一个结果之后加一个NULL字符，而不是默认加一个换行符。然后xargs -0表示xargs用NULL来作为分隔符。
find -print0和xargs -0原理及用法 - liuyihua1992 - 博客园 https://www.cnblogs.com/liuyihua1992/p/9689314.html

dos2unix需要先安装
john@john-wang:~/Vitis-AI_1.2/YOLOv3$ sudo apt-get install dos2unix

8.2 Caffe环境安装
Yolov3需要将Darknet模型转化为Caffe模型，需要python2.7，而Vitis AI镜像没有python2.7, 要么需要在镜像里装python2.7, 要么需要在ubuntu18.04装caffe, 我选择了后者。
按顺序执行以下编译命令：
cd caffe-master/
make clean
make -j
make pycaffe
make distribute

依赖项安装
sudo apt-get install libopencv-dev
sudo cp /usr/lib/x86_64-linux-gnu/pkgconfig/opencv.pc /usr/lib/pkgconfig/
https://www.cnblogs.com/cumtchw/p/12984073.html
sudo apt-get install libboost-all-dev
sudo apt-get install libgflags-dev
sudo apt-get install libblas-dev
sudo apt-get install libhdf5-serial-dev
sudo apt install liblmdb-dev
sudo apt install libatlas-base-dev
sudo apt-get install libgoogle-glog-dev
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev
libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev

在Makefile.config找到以下行并添加蓝色部分
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial

john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ make

执行以下三个步骤解决上面的问题：

确认opencv安装正确
$ pkg-config --modversion opencv
我的opencv version: 3.2
Makefile文件中，找到LIBRARIES（在PYTHON_LIBRARIES := boost_python python2.7 前一行）
LIBRARIES += glog gflags protobuf leveldb snappy lmdb boost_system hdf5_hl hdf5 m opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs
3.将Makefile.config中OPENCV_VERSION := 3取消注释;
https://blog.csdn.net/qq_32694235/article/details/90581313

make编译成功。

每次转换需要重新设置环境变量：

john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ export CAFFE_ROOT=~/Vitis-AI_1.2/YOLOv3/caffe-master
john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ export PYTHONPATH=$CAFFE_ROOT/distribute/python:/usr/local/lib/python2.7/dist-packages/numpy/core/include/:$PYTHONPATH
john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ export LD_LIBRARY_PATH=$CAFFE_ROOT/distribute/lib:$LD_LIBRARY_PATH

import pip; print(pip.pep425tags.get_supported())
john@john-wang:/usr/local/lib/python2.7$ sudo pip install numpy-1.15.0-cp27-cp27mu-manylinux1_x86_64.whl
john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ sudo apt-get install python-skimage
john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ sudo apt-get install python-protobuf

测试caffe是否安装成功

john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ python -c "import caffe; print caffe.__file__"

$ python …/yolo_convert.py
0_model_darknet/yolov3.cfg 0_model_darknet/yolov3.weights
1_model_caffe/v3.prototxt 1_model_caffe/v3.caffemodel

$ python …/yolo.py
0_model_darknet/yolov3-tiny.cfg 0_model_darknet/yolov3-tiny.weights
1_model_caffe/v3-tiny.prototxt 1_model_caffe/v3-tiny.caffemodel

九、Yolov3量化
第九章和第十章的软件环境为Ubuntu18.04 Vitis-AI GPU镜像中操作。
9.1 量化数据集
使用在DNNDK 校准阶段中的calib.txt 格式：

9.2 浮点模型改变
改变v3.prototxt 文件:
• 注释掉五行
• 增加训练阶段校准图像的ImageData层
name: “Darknet2Caffe”
#####Comment following five lines generated by converter#####
#input: “data”
#input_dim: 1
#input_dim: 3
#input_dim: 608
#input_dim: 608
#####Change input data layer to ImageDate and modify root_folder/source before run DECENT#####
layer {
name: “data”
type: “ImageData”
top: “data”
top: “label”
include {
phase: TRAIN
}
transform_param {
mirror: false
yolo_height:608 #change height according to Darknet model
yolo_width:608 #change width according to Darknet model
}
image_data_param {
source:"/PATH_TO/5_file_for_test/calib.txt" #change path accordingly
root_folder:"/PATH_TO/5_file_for_test/calib_data/" #change path accordingly
batch_size: 1
shuffle: false
}
}

No changes after below layers#####

9.3 Caffe量化命令
$ cd example_yolov3
$ cp 1_model_caffe/v3.caffemodel ./2_model_for_quantize/

Yolov3编译命令：
(vitis-ai-caffe) john@john-virtual-machine:/workspace/YOLOv3/example_yolov3$vai_q_caffe quantize -model 2_model_for_quantize/v3.prototxt -weights 2_model_for_quantize/v3.caffemodel -sigmoided_layers layer81-conv,layer93-conv,layer105-conv -output_dir 3_model_after_quantize

Yolov3-tiny编译命令：
(vitis-ai-caffe) john@john-virtual-machine:/workspace/YOLOv3/example_yolov3$ vai_q_caffe quantize -model 2_model_for_quantize/v3-tiny.prototxt -weights 2_model_for_quantize/v3-tiny.caffemodel -sigmoided_layers layer15-conv,layer22-conv -output_dir 3_model_after_quantize

成功输出量化模型：

十、Yolov3编译
10.1 改变量化模型
./3_model_after_quantize/deploy.prototxt:
layer {
name: “data”
type: “Input”
top: “data”
#####Comment following five lines #####
#transform_param {

mirror: false

yolo_height: 608

yolo_width: 608

}

#####Nothing change to below layers#####
input_param {
shape {yolov3_deploy.tar.gz
dim: 1
dim: 3
dim: 608
dim: 608
}}}

10.2 Kernel编译命令
BOARD=Ultra96
MODEL_NAME=tf_yolov3_voc_416_416_65.63G
MODEL_UNZIP=${MODEL_NAME}
vai_c_tensorflow
–frozen_pb $KaTeX parse error: Undefined control sequence: \ at position 41: …eploy_model.pb \̲ ̲ --arch /opt/v…$ {BOARD}/ $KaTeX parse error: Undefined control sequence: \ at position 14: {BOARD}.json \̲ ̲ --output_dir …$ {MODEL}

10.3 DPU Kernel
tf_yolov3_voc_416_416_65.63G
Yolov3编译结果

十一、Yolov3开发板文件
11.1 开发板项目文件夹

11.2 开发板源程序
‘’‘Model post-processing’’’
#max_boxes = 20每张图每类最多检测到20个框
def eval(yolo_outputs, image_shape, max_boxes = 20):
score_thresh = 0.2
nms_thresh = 0.45
class_names = get_class(classes_path)
#将anchor_box分为3组，分别分配给13x13、26x26、52x52等3个yolo_model输出的feature map
anchors = get_anchors(anchors_path)
anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
boxes = []
box_scores = []
input_shape = np.shape(yolo_outputs[0])[1 : 3]
input_shape = np.array(input_shape)*32
#分别对3个feature map运行
for i in range(len(yolo_outputs)):
boxes, box_scores = boxes_and_scores(yolo_outputs[i], anchors[anchor_mask[i]], len(class_names), input_shape, image_shape)
boxes.append(boxes)
box_scores.append(box_scores)
boxes = np.concatenate(boxes, axis = 0)
box_scores = np.concatenate(box_scores, axis = 0)
#计算MASK掩码，过滤小于score阈值的值，只保留大于阈值的值
mask = box_scores >= score_thresh
boxes = []
scores = []
classes = []
for c in range(len(class_names)):
class_boxes_np = boxes[mask[:, c]]
class_box_scores_np = box_scores[:, c]
class_box_scores_np = class_box_scores_np[mask[:, c]]
#根据索引nms_index_np选择class_boxes和class_box_scores，标出选出的框的类别classes
nms_index_np = nms_boxes(class_boxes_np, class_box_scores_np)
class_boxes_np = class_boxes_np[nms_index_np]
class_box_scores_np = class_box_scores_np[nms_index_np]
classes_np = np.ones_like(class_box_scores_np, dtype = np.int32) * c
boxes.append(class_boxes_np)
scores_.append(class_box_scores_np)
classes_.append(classes_np)
#将运算得到的目标框用concatenate()函数拼接为（?, 4)的元组，将目标框的置信度拼接为(?,1)的元组
boxes_ = np.concatenate(boxes_, axis = 0)
scores_ = np.concatenate(scores_, axis = 0)
classes_ = np.concatenate(classes_, axis = 0)
return boxes_, scores_, classes_

“”“DPU Kernel Name for tf_yolov3_voc”""
KERNEL_CONV=“tf_yolov3”

“”“DPU IN/OUT Name for tf_yolov3_voc”""
CONV_INPUT_NODE=“conv2d_1_convolution”
CONV_OUTPUT_NODE1=“conv2d_59_convolution”
CONV_OUTPUT_NODE2=“conv2d_67_convolution”
CONV_OUTPUT_NODE3=“conv2d_75_convolution”

if name == “main”:
“”" Attach to DPU driver and prepare for running “”"
n2cube.dpuOpen()
“”" Create DPU Kernels for tf_yolov3_voc “”"
kernel = n2cube.dpuLoadKernel(KERNEL_CONV)
“”" Create DPU Tasks for tf_yolov3_voc “”"
task = n2cube.dpuCreateTask(kernel, 0)
image_folder = “/home/xilinx/jupyter_notebooks/Cityscapes/JPEGImages/”
“”“Load image to DPU”""
listimage = [i for i in os.listdir(image_folder) if i.endswith(“jpg”)]
listimage.sort()
print("\nYou can press ESC to quit")
imagenumber = len(listimage)
print("\nimagenumber = %d\n"%imagenumber)
cv2.namedWindow(“Display”, cv2.WINDOW_AUTOSIZE)
for i in range(imagenumber):
print(listimage[i])
path = image_folder + listimage[i]
img = cv2.imread(path)
image = cv2.imread(path)
image_ho, image_wo, _ = image.shape
image_size = image.shape[:2]
image_data = pre_process(image, (416, 416))
image_data = np.array(image_data,dtype=np.float32)
input_len = n2cube.dpuGetInputTensorSize(task, CONV_INPUT_NODE)
“”“Get input Tesor”""
n2cube.dpuSetInputTensorInHWCFP32(task,CONV_INPUT_NODE,image_data,input_len)
“”“Model run on DPU”""
n2cube.dpuRunTask(task)
“”“Get the output tensor”""
#输出层(最终的3个有效特征层)的shape分别为(1,13,13,75)，(1,26,26,75)，(1,52,52,75)，最后一个维度为75是因为该图是基于voc数据集的，它的类为20种。yolov3一共有9个anchor，3个输出，每个输出用3个先验框anchor，所以输出的每个位置预测3个box。对于13x13的输出，每个box的参数包括tx, ty, tw, th，及该box有物体的置信分数，该box中为每类物体的概率，所以最后维度为3x25
conv_sbbox_size = n2cube.dpuGetOutputTensorSize(task, CONV_OUTPUT_NODE1)
conv_out1 = n2cube.dpuGetOutputTensorInHWCFP32(task, CONV_OUTPUT_NODE1, conv_sbbox_size)
conv_out1 = np.reshape(conv_out1, (1, 13, 13, 75))
conv_mbbox_size = n2cube.dpuGetOutputTensorSize(task, CONV_OUTPUT_NODE2)
conv_out2 = n2cube.dpuGetOutputTensorInHWCFP32(task, CONV_OUTPUT_NODE2, conv_mbbox_size)
conv_out2 = np.reshape(conv_out2, (1, 26, 26, 75))
conv_lbbox_size = n2cube.dpuGetOutputTensorSize(task, CONV_OUTPUT_NODE3)
conv_out3 = n2cube.dpuGetOutputTensorInHWCFP32(task, CONV_OUTPUT_NODE3, conv_lbbox_size)
conv_out3 = np.reshape(conv_out3, (1, 52, 52, 75))
yolo_outputs = [conv_out1, conv_out2, conv_out3]
“”“读取分类Get model classification information”""
classes_path = “./image/voc_classes.txt”
class_names = get_class(classes_path)
“”“读取先验框Get model anchor value”""
anchors_path = “./model_data/yolo_anchors.txt”
anchors = get_anchors(anchors_path)
“”“Post-processing”""
out_boxes, out_scores, out_classes = eval(yolo_outputs, image_size)
items = []
draws = []
#对于c个目标类别中的每个目标框i，调用Pillow画图
for i, c in reversed(list(enumerate(out_classes))):
#目标类别的名字
predicted_class = class_names[c]
#目标框
box = out_boxes[i]
#目标框的置信度评分
score = out_scores[i]
top, left, bottom, right = box
#目标框的上、左两个坐标小数点后一位四舍五入
top = max(0, np.floor(top + 0.5).astype(‘int32’))
left = max(0, np.floor(left + 0.5).astype(‘int32’))
#目标框的下、右两个坐标小数点后一位四舍五入，与图片尺寸相比，取最小值
bottom = min(image_ho, np.floor(bottom + 0.5).astype(‘int32’))
right = min(image_wo, np.floor(right + 0.5).astype(‘int32’))
draw = [left, top, right, bottom, score, c]
item = [predicted_class, score, left, top, right, bottom]
draws.append(draw)
items.append(item)

    image_id = 0
   #调用画目标框子函数

image_result = draw_bbox(image, draws, class_names)
#我们在这里为了观赏放大一倍图像
height, width = img.shape[:2]
size = (int(width2), int(height2))
image_result = cv2.resize(image_result, size, interpolation=cv2.INTER_LINEAR)
cv2.imshow(“Display”, image_result)
if cv2.waitKey(2000)==27:
break

11.3 图片与现场演示
11.3.1 图片识别效果
在这里插入图片描述

11.3.2 现场识别效果
在这里插入图片描述

在这里插入图片描述

yolov3和yolov-tiny都放在我的双系统计算机YOLOv3目录下
在这里插入图片描述

在这里插入图片描述
find -print0表示在find的每一个结果之后加一个NULL字符，而不是默认加一个换行符。然后xargs -0表示xargs用NULL来作为分隔符。
https://www.cnblogs.com/liuyihua1992/p/9689314.html

john@john-wang:~/Vitis-AI_1.2/YOLOv3$ sudo apt-get install dos2unix

sudo apt-get install libopencv-dev
这样安装完成之后，编译darknet的时候提示No package ‘opencv’ found。
john@john-wang:~/Vitis-AI_1.2/YOLOv3$ sudo find / -name opencv*.pc
/usr/lib/x86_64-linux-gnu/pkgconfig/opencv.pc

john@john-wang:~/Vitis-AI_1.2/YOLOv3$ sudo cp /usr/lib/x86_64-linux-gnu/pkgconfig/opencv.pc /usr/lib/pkgconfig/
https://www.cnblogs.com/cumtchw/p/12984073.html

cd caffe-master/
make clean
make -j
make pycaffe
make distribute

src/caffe/internal_thread.cpp:1:10: fatal error: boost/thread.hpp: No such file or directory
#include <boost/thread.hpp>
^~~~~~~~~~~~~~~~~~
compilation terminated.
src/caffe/util/benchmark.cpp:1:10: fatal error: boost/date_time/posix_time/posix_time.hpp: No such file or directory
#include <boost/date_time/posix_time/posix_time.hpp>
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ sudo apt-get install libboost-all-dev

在这里插入图片描述
sudo apt-get install libgflags-dev
sudo apt-get install libblas-dev
apt-get install libhdf5-serial-dev
在Makefile.config找到以下行并添加蓝色部分
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial
sudo apt install liblmdb-dev
sudo apt install libatlas-base-dev
sudo apt-get install libgoogle-glog-dev
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev
libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev

CXX src/caffe/transformers/yolo_transformer.cpp
AR -o .build_release/lib/libcaffe.a
LD -o .build_release/lib/libcaffe.so.1.0.0
/usr/bin/ld: cannot find -lsnappy
collect2: error: ld returned 1 exit status
Makefile:573: recipe for target ‘.build_release/lib/libcaffe.so.1.0.0’ failed
make: *** [.build_release/lib/libcaffe.so.1.0.0] Error 1
在这里插入图片描述
apt-cache search snappy
john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ sudo apt-get install libsnappy-dev

john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ make
LD -o .build_release/lib/libcaffe.so.1.0.0
CXX tools/upgrade_net_proto_text.cpp
CXX/LD -o .build_release/tools/upgrade_net_proto_text.bin
.build_release/lib/libcaffe.so: undefined reference to cvLoadImage' .build_release/lib/libcaffe.so: undefined reference tocv::imread(cv::String const&, int)’

在这里插入图片描述

to make sure opencv is installed properly
$ pkg-config --modversion opencv
my opencv version:3.2
Makefile文件中，找到LIBRARIES（在PYTHON_LIBRARIES := boost_python python2.7 前一行）
LIBRARIES += glog gflags protobuf leveldb snappy lmdb boost_system hdf5_hl hdf5 m opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs
3.将Makefile.config中OPENCV_VERSION := 3取消注释;
https://blog.csdn.net/qq_32694235/article/details/90581313

john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ make pycaffe
john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ make distribute
在这里插入图片描述

每次进行转换需要重新执行

john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ export CAFFE_ROOT=~/Vitis-AI_1.2/YOLOv3/caffe-master
john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ export LD_LIBRARY_PATH=$CAFFE_ROOT/distribute/lib:$LD_LIBRARY_PATH
john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ export PYTHONPATH=$CAFFE_ROOT/distribute/python:/usr/local/lib/python2.7/dist-packages/numpy/core/include/:$PYTHONPATH

import pip; print(pip.pep425tags.get_supported())
john@john-wang:/usr/local/lib/python2.7$ sudo pip install numpy-1.15.0-cp27-cp27mu-manylinux1_x86_64.whl

john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ sudo apt-get install python-skimage
john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ sudo apt-get install python-protobuf

验证如下：
john@john-wang:~/Vitis-AI_1.2/YOLOv3/caffe-master$ python -c “import caffe; print caffe.file”
在这里插入图片描述
john@john-wang:~/Vitis-AI_1.2/YOLOv3/example_yolov3$ bash 0_convert.sh

vai_q_caffe quantize -model 2_model_for_quantize/v3.prototxt -weights 2_model_for_quantize/v3.caffemodel -sigmoided_layers layer81-conv,layer93-conv,layer105-conv -output_dir 3_model_after_quantize

#Assuming “decent” tool is already in the PATH
$ decent quantize -model 2_model_for_quantize/v3.prototxt #path to prototxt
-weights 2_model_for_quantize/v3.caffemodel #path to caffemodel
-gpu 0
-sigmoided_layers layer81-conv,layer93-conv,layer105-conv
-output_dir 3_model_after_quantize
-method 1
在这里插入图片描述

(vitis-ai-caffe) john@john-wang:/workspace/YOLOv3/example_yolov3$ ./2_quantizetiny.sh
W1101 07:07:01.257980 59 net.cpp:876] Force copying param 4 weights from layer ‘layer0-bn’; shape mismatch. Source param shape is 1 (1); target param shape is 1 1 1 1 (1).
W1101 07:07:01.258191 59 net.cpp:876] Force copying param 4 weights from layer ‘layer2-bn’; shape mismatch. Source param shape is 1 (1); target param shape is 1 1 1 1 (1).
W1101 07:07:01.258407 59 net.cpp:876] Force copying param 4 weights from layer ‘layer4-bn’; shape mismatch. Source param shape is 1 (1); target param shape is 1 1 1 1 (1).
W1101 07:07:01.258666 59 net.cpp:876] Force copying param 4 weights from layer ‘layer6-bn’; shape mismatch. Source param shape is 1 (1); target param shape is 1 1 1 1 (1).
W1101 07:07:01.259093 59 net.cpp:876] Force copying param 4 weights from layer ‘layer8-bn’; shape mismatch. Source param shape is 1 (1); target param shape is 1 1 1 1 (1).
W1101 07:07:01.260679 59 net.cpp:876] Force copying param 4 weights from layer ‘layer10-bn’; shape mismatch. Source param shape is 1 (1); target param shape is 1 1 1 1 (1).
W1101 07:07:01.265461 59 net.cpp:876] Force copying param 4 weights from layer ‘layer12-bn’; shape mismatch. Source param shape is 1 (1); target param shape is 1 1 1 1 (1).
W1101 07:07:01.265913 59 net.cpp:876] Force copying param 4 weights from layer ‘layer13-bn’; shape mismatch. Source param shape is 1 (1); target param shape is 1 1 1 1 (1).
W1101 07:07:01.267045 59 net.cpp:876] Force copying param 4 weights from layer ‘layer14-bn’; shape mismatch. Source param shape is 1 (1); target param shape is 1 1 1 1 (1).
W1101 07:07:01.267408 59 net.cpp:876] Force copying param 4 weights from layer ‘layer18-bn’; shape mismatch. Source param shape is 1 (1); target param shape is 1 1 1 1 (1).
W1101 07:07:01.268255 59 net.cpp:876] Force copying param 4 weights from layer ‘layer21-bn’; shape mismatch. Source param shape is 1 (1); target param shape is 1 1 1 1 (1).

在这里插入图片描述

MODEL_UNZIP NAME: cf_yolov3_voc_608_608_65.42G
MODEL NAME: yolov3
FRAMEWORK cf
[DLet]Generate DPU DCF file dpu-03-26-2020-13-30.dcf successfully.

VITIS_AI Compilation - Xilinx Inc.

Kernel topology “yolov3_kernel_graph.jpg” for network “yolov3”
kernel list info for network “yolov3”
Kernel ID : Name
0 : yolov3

                         Kernel Name : yolov3

                         Kernel Type : DPUKernel
                           Code Size : 2.66MB
                          Param Size : 59.06MB
                       Workload MACs : 140691.89MOPS
                     IO Memory Space : 17.98MB
                          Mean Value : 0, 0, 0, 
                  Total Tensor Count : 82
            Boundary Input Tensor(s)   (H*W*C)
                           data:0(0) : 608*608*3

           Boundary Output Tensor(s)   (H*W*C)
                   layer81_conv:0(0) : 19*19*255
                   layer93_conv:0(1) : 38*38*255
                  layer105_conv:0(2) : 76*76*255

                    Total Node Count : 81
                       Input Node(s)   (H*W*C)
                      layer0_conv(0) : 608*608*3

                      Output Node(s)   (H*W*C)
                     layer81_conv(0) : 19*19*255
                     layer93_conv(0) : 38*38*255
                    layer105_conv(0) : 76*76*255

在这里插入图片描述

MODEL_UNZIP NAME: cf_yolotiny_voc_416_416_11.2G
MODEL NAME: yolotiny
FRAMEWORK cf
[DLet]Generate DPU DCF file dpu-03-26-2020-13-30.dcf successfully.

VITIS_AI Compilation - Xilinx Inc.

Kernel topology “yolotiny_kernel_graph.jpg” for network “yolotiny”
kernel list info for network “yolotiny”
Kernel ID : Name
0 : yolotiny

                         Kernel Name : yolotiny

                         Kernel Type : DPUKernel
                           Code Size : 0.13MB
                          Param Size : 8.44MB
                       Workload MACs : 5564.96MOPS
                     IO Memory Space : 1.16MB
                          Mean Value : 0, 0, 0, 
                  Total Tensor Count : 19
            Boundary Input Tensor(s)   (H*W*C)
                           data:0(0) : 416*416*3

           Boundary Output Tensor(s)   (H*W*C)
                   layer15_conv:0(0) : 13*13*255
                   layer22_conv:0(1) : 26*26*255

                    Total Node Count : 17
                       Input Node(s)   (H*W*C)
                      layer0_conv(0) : 416*416*3

                      Output Node(s)   (H*W*C)
                     layer15_conv(0) : 13*13*255
                     layer22_conv(0) : 26*26*255

在这里插入图片描述

yolov3-tiny cfg file

On the line 93, replace this:
[maxpool]
size = 2
With this:
[maxpool]
size = 1

Edit the main.cc file which inside yolov3_deploy/src folder.And Make changes as follows:

At Line 63: Add
#include <opencv2/opencv.hpp>

At Line 239: Change
const vector outputs_node ={“layer81_conv”, “layer93_conv”, “layer105_conv”};
To
const vector outputs_node = {“layer15_conv”, “layer22_conv”};

At Line 395: Change
DPUKernel *kernel = dpuLoadKernel(“yolo”);
To
DPUKernel *kernel = dpuLoadKernel(“yolo_tiny”);

Makefile
At Line 69: Change
MODEL = $(CUR_DIR)/model/dpu_yolo.elf
To
MODEL = $(CUR_DIR)/model/dpu_yolotiny.elf

john@john-wang:~/Vitis-AI_1.2/YOLOv3/example_yolov3$ bash 0_convert.sh
john@john-wang:~/Vitis-AI_1.2/YOLOv3/example_yolov3$ bash 0_convertiny.sh
在这里插入图片描述

(vitis-ai-caffe) john@john-wang:/workspace/YOLOv3/example_yolov3$ bash 2_quantizetiny.sh

(vitis-ai-caffe) john@john-wang:/workspace/YOLOv3/example_yolov3$ bash 2_quantizetiny.sh
在这里插入图片描述

vector<vector> applyNMS(vector<vector>& boxes,int classes, const float thres) {
vector<pair<int, float>> order(boxes.size());
vector<vector> result;

for(int k = 0; k < classes; k++) {
    for (size_t i = 0; i < boxes.size(); ++i) {
        order[i].first = i;
        boxes[i][4] = k;
        order[i].second = boxes[i][6 + k];
    }
    sort(order.begin(), order.end(),
         [](const pair<int, float> &ls, const pair<int, float> &rs) { return ls.second > rs.second; });

    vector<bool> exist_box(boxes.size(), true);

    for (size_t _i = 0; _i < boxes.size(); ++_i) {
        size_t i = order[_i].first;
        if (!exist_box[i]) continue;
        if (boxes[i][6 + k] < CONF) {
            exist_box[i] = false;
            continue;
        }
        /* add a box as result */
        result.push_back(boxes[i]);

        for (size_t _j = _i + 1; _j < boxes.size(); ++_j) {
            size_t j = order[_j].first;
            if (!exist_box[j]) continue;
            float ovr = cal_iou(boxes[j], boxes[i]);
            if (ovr >= thres) exist_box[j] = false;
        }
    }
}

return result;

}

error
在这里插入图片描述

在这里插入图片描述

./1_test_caffe.sh: line 7: 149 Floating point exception(core dumped) ./…/caffe-master/build/examples/yolo/yolov3_detect.bin 1_model_caffe/v3.prototxt 1_model_caffe/v3.caffemodel 5_file_for_test/image.txt -out_file 5_file_for_test/yolov3_caffe_result.txt -confidence_threshold 0.005 -classes 80 -anchorCnt 3

john@john-wang:~/Vitis-AI_1.2/YOLOv3/example_yolov3$ ./1_test_caffe.sh

detection.jpg
在这里插入图片描述