tkDNN是一个用cuDNN和tensorRT原语构建的深度神经网络库,专门用于NVIDIA Jetson板。在TK1(branch cudnn2)、TX1、TX2、AGX Xavier、Nano和几个离散的gpu上进行了测试。这个项目的主要目标是尽可能多地利用NVIDIA板来获得最佳的推理性能。它不允许进行训练(training)。
This branch works on every NVIDIA GPU that supports the dependencies:
CUDA 10.0
CUDNN 7.603
TENSORRT 6.01
OPENCV 3.4
yaml-cpp 0.5.2 (sudo apt install libyaml-cpp-dev)
About OpenCV
要用contrib us编译和安装OpenCV4,需要安装OpenCV4.sh脚本。它会在下载文件夹中下载并编译OpenCV。
bash scripts/install_OpenCV4.sh
当使用没有使用contrib编译的openCV时,在include/tkDNN/DetectionNN.h中注释OPENCV_CUDACONTRIBCONTRIB的定义。注释时,网络的预处理在CPU上计算,否则在GPU上计算。在后一种情况下,端到端延迟中保存了一些毫秒。
How to compile this repo
git clone https://github.com/ceccocats/tkDNN
cd tkDNN
mkdir build
cd build
cmake ..
make
Workflow
使用自定义神经网络对tkDNN进行推理所需的步骤。
1.用你最喜欢的框架构建和训练一个神经网络模型。
2.导出每个层的权重和偏差,并将它们保存在一个二进制文件中(one for layer)。 bin文件
3.导出每个层的输出,并将它们保存在一个二进制文件中(one for layer)。 bin文件
4.创建一个新的测试并定义网络,使用提取的权值和输出一层一层地检查结果。
5.做推理。
How to export weights
权值对于任何网络运行推理都是必不可少的。对于每个测试,都需要一个组织如下的文件夹(在build文件夹中):
test_nn
|---- layers/ (folder containing a binary file for each layer with the corresponding wieghts and bias)
|---- debug/ (folder containing a binary file for each layer with the corresponding outputs)
1)Export weights from darknet
要导出在darknet框架中定义的网络网络的权重,请使用这个darknet分支(https://git.hipert.unimore.it/fgatti/darknet),并遵循以下步骤获得一个正确的debug和layers文件夹,为tkDNN做好准备。
git clone https://git.hipert.unimore.it/fgatti/darknet.git
cd darknet
make
mkdir layers debug
./darknet export <path-to-cfg-file> <path-to-weights> layers
注意b.如果你也想调试,使用CPU编译(在Makefile中保留GPU=0)。
Darknet Parser(解析)
tkDNN实现和易于解析的darknet cfg文件,一个网络可以转换tk::dnn::darknetParser:
// example of parsing yolo4
tk::dnn::Network *net = tk::dnn::darknetParser("yolov4.cfg", "yolov4/layers", "coco.names");
net->print();//看这个写的,中间填yolov4/layers
所有来自darknet的模型现在都是直接从cfg中解析的,你仍然需要用前面介绍的工具导出权重。
Supported layers
convolutional maxpool avgpool shortcut upsample route reorg region yolo
Supported activations
relu leaky mish
Run the demo
This is an example using yolov4.
要运行对象检测,首先运行以下命令创建.rt文件:
rm yolo4_fp32.rt # be sure to delete(or move) old tensorRT files
./test_yolo4 # run the yolo test (is slow)
如果在创建过程中遇到问题,尝试以这种方式检查激活TensorRT调试的错误
cmake .. -DDEBUG=True
make
一旦你成功创建了你的rt文件,运行演示:
./demo yolo4_fp32.rt ../demo/yolo_test.mp4 y
通常,演示程序有7个参数
./demo <network-rt-file> <path-to-video> <kind-of-network> <number-of-classes> <n-batches> <show-flag>
where
<network-rt-file> is the rt file generated by a test
<<path-to-video> is the path to a video file or a camera input
<kind-of-network> is the type of network. Thee types are currently supported: y (YOLO family), c (CenterNet family) and m (MobileNet-SSD family)
<number-of-classes>is the number of classes the network is trained on
<n-batches> number of batches to use in inference (N.B. you should first export TKDNN_BATCHSIZE to the required n_batches and create again the rt file for the network).
<show-flag> if set to 0 the demo will not show the visualization but save the video into result.mp4 (if n-batches ==1)
<conf-thresh> confidence threshold for the detector. Only bounding boxes with threshold greater than conf-thresh will be displayed.
N.b. By default it is used FP32 inference(默认使用FP32推断)
FP16 inference
要使用FP16推断运行对象检测演示,请遵循以下步骤(使用yolov3的示例)
export TKDNN_MODE=FP16 # set the half floating point optimization(设置半浮点优化)
rm yolo3_fp16.rt # be sure to delete(or move) old tensorRT files
./test_yolo3 # run the yolo test (is slow)
./demo yolo3_fp16.rt ../demo/yolo_test.mp4 y
注意:使用FP16推断会导致结果(第一个或第二个小数)出现一些错误。
INT8 inference
要运行带有INT8推理的对象检测演示,需要设置三个环境变量
export TKDNN_MODE=INT8: set the 8-bit integer optimization
export TKDNN_CALIB_IMG_PATH=/path/to/calibration/image_list.txt : image_list.txt has in each line the absolute path to a calibration image(在每一行都有一个校准图像的绝对路径)
export TKDNN_CALIB_LABEL_PATH=/path/to/calibration/label_list.txt: label_list.txt has in each line the absolute path to a calibration label(label list.txt在每一行中都有一个校准标签的绝对路径)
你应该提供image_list.txt和label_list。txt,使用训练图像。但是,如果您想快速测试可以运行的INT8推断(从这个repo根文件夹)
bash scripts/download_validation.sh COCO
自动下载COCO2017验证(demo文件夹内)并创建所需文件。使用BDD而不是COCO来下载BDD验证。那么使用yolo3和COCO数据集的完整示例是
export TKDNN_MODE=INT8
export TKDNN_CALIB_LABEL_PATH=../demo/COCO_val2017/all_labels.txt
export TKDNN_CALIB_IMG_PATH=../demo/COCO_val2017/all_images.txt
rm yolo3_int8.rt # be sure to delete(or move) old tensorRT files
./test_yolo3 # run the yolo test (is slow)
./demo yolo3_int8.rt ../demo/yolo_test.mp4 y
注意:
使用INT8推断将导致结果中的一些错误。
测试将会慢一些:这是由于INT8校准,可能需要一些时间来完成。
INT8校准要求TensorRT版本大于或等于6.0
默认情况下,只有100个图像用于创建校准表(在代码中设置)。
BatchSize bigger than 1
export TKDNN_BATCHSIZE=2
# build tensorRT files
这将创建一个具有所需最大批处理大小的TensorRT文件。测试仍然运行批量为1的批处理,但是创建的tensorRT可以管理所需的批处理大小。
Test batch Inference
这将用随机输入测试网络,并检查每个批的输出是否相同。
./test_rtinference <network-rt-file> <number-of-batches>
# <number-of-batches> should be less or equal to the max batch size of the <network-rt-file>
# example
export TKDNN_BATCHSIZE=4 # set max batch size
rm yolo3_fp32.rt # be sure to delete(or move) old tensorRT files
./test_yolo3 # build RT file
./test_rtinference yolo3_fp32.rt 4 # test with a batch size of 4
mAP demo
To compute mAP, precision, recall and f1score, run the map_demo.
A validation set is needed. To download COCO_val2017 (80 classes) run (form the root folder):
bash scripts/download_validation.sh COCO
To download Berkeley_val (10 classes) run (form the root folder):
bash scripts/download_validation.sh BDD
To compute the map, the following parameters are needed:
./map_demo <network rt> <network type [y|c|m]> <labels file path> <config file path>
where
<network rt>: rt file of a chosen network on which compute the mAP.
<network type [y|c|m]>: type of network. Right now only y(yolo), c(centernet) and m(mobilenet) are allowed
<labels file path>: path to a text file containing all the paths of the ground-truth labels. It is important that all the labels of the ground-truth are in a folder called 'labels'. In the folder containing the folder 'labels' there should be also a folder 'images', containing all the ground-truth images having the same same as the labels. To better understand, if there is a label path/to/labels/000001.txt there should be a corresponding image path/to/images/000001.jpg.
<config file path>: path to a yaml file with the parameters needed for the mAP computation, similar to demo/config.yaml
Example:
cd build
./map_demo dla34_cnet_FP32.rt c ../demo/COCO_val2017/all_labels.txt ../demo/config.yaml
This demo also creates a json file named net_name_COCO_res.json containing all the detections computed. The detections are in COCO format, the correct format to submit the results to CodaLab COCO detection challenge.