英伟达jetpack和tensorrt_基于NVIDIA TensorRT利用来自TensorFlow模型的进行图像分类

TensorFlow->TensorRT Image Classification

This contains examples, scripts and code related to image classification using TensorFlow models (from here) converted to TensorRT. Converting TensorFlow models to TensorRT offers significant performance gains on the Jetson TX2 as seen below.

Models

The table below shows various details related to pretrained models ported from the TensorFlow slim model zoo.

Model

Input Size

TensorRT (TX2 / Half)

TensorRT (TX2 / Float)

TensorFlow (TX2 / Float)

Input Name

Output Name

Preprocessing Fn.

inception_v1

224x224

7.98ms

12.8ms

27.6ms

input

InceptionV1/Logits/SpatialSqueeze

inception

inception_v3

299x299

26.3ms

46.1ms

98.4ms

input

InceptionV3/Logits/SpatialSqueeze

inception

inception_v4

299x299

52.1ms

88.2ms

176ms

input

InceptionV4/Logits/Logits/BiasAdd

inception

inception_resnet_v2

299x299

53.0ms

98.7ms

168ms

input

InceptionResnetV2/Logits/Logits/BiasAdd

inception

resnet_v1_50

224x224

15.7ms

27.1ms

63.9ms

input

resnet_v1_50/SpatialSqueeze

vgg

resnet_v1_101

224x224

29.9ms

51.8ms

107ms

input

resnet_v1_101/SpatialSqueeze

vgg

resnet_v1_152

224x224

42.6ms

78.2ms

157ms

input

resnet_v1_152/SpatialSqueeze

vgg

resnet_v2_50

299x299

27.5ms

44.4ms

92.2ms

input

resnet_v2_50/SpatialSqueeze

inception

resnet_v2_101

299x299

49.2ms

83.1ms

160ms

input

resnet_v2_101/SpatialSqueeze

inception

resnet_v2_152

299x299

74.6ms

124ms

230ms

input

resnet_v2_152/SpatialSqueeze

inception

mobilenet_v1_0p25_128

128x128

2.67ms

2.65ms

15.7ms

input

MobilenetV1/Logits/SpatialSqueeze

inception

mobilenet_v1_0p5_160

160x160

3.95ms

4.00ms

16.9ms

input

MobilenetV1/Logits/SpatialSqueeze

inception

mobilenet_v1_1p0_224

224x224

12.9ms

12.9ms

24.4ms

input

MobilenetV1/Logits/SpatialSqueeze

inception

vgg_16

224x224

38.2ms

79.2ms

171ms

input

vgg_16/fc8/BiasAdd

vgg

The times recorded include data transfer to GPU, network execution, and data transfer back from GPU. Time does not include preprocessing. See scripts/test_tf.py, scripts/test_trt.py, and src/test/test_trt.cu for implementation details.

Setup

Flash the Jetson TX2 using JetPack 3.2. Be sure to install

CUDA 9.0

OpenCV4Tegra

cuDNN

TensorRT 3.0

Install pip on Jetson TX2. sudo apt-get install python-pip

Install TensorFlow on Jetson TX2.

Download the TensorFlow 1.5.0 pip wheel from here. This build of TensorFlow is provided as a convenience for the purposes of this project.

Install TensorFlow using pip sudo pip install tensorflow-1.5.0rc0-cp27-cp27mu-linux_aarch64.whl

Install uff exporter on Jetson TX2.

Download TensorRT 3.0.4 for Ubuntu 16.04 and CUDA 9.0 tar package from https://developer.nvidia.com/nvidia-tensorrt-download.

Extract archive tar -xzf TensorRT-3.0.4.Ubuntu-16.04.3.x86_64.cuda-9.0.cudnn7.0.tar.gz

Install uff python package using pip sudo pip install TensorRT-3.0.4/uff/uff-0.2.0-py2.py3-none-any.whl

Clone and build this project git clone --recursive https://github.com/NVIDIA-Jetson/tf_to_trt_image_classification.git

cd tf_to_trt_image_classification

mkdir build

cd build

cmake ..

make

cd ..

Download models and create frozen graphs

Run the following bash script to download all of the pretrained models.

source scripts/download_models.sh

If there are any models you don't want to use, simply remove the URL from the model list in scripts/download_models.sh.

Next, because the TensorFlow models are provided in checkpoint format, we must convert them to frozen graphs for optimization with TensorRT. Run the scripts/models_to_frozen_graphs.py script.

python scripts/models_to_frozen_graphs.py

If you removed any models in the previous step, you must add 'exclude': true to the corresponding item in the NETS dictionary located in scripts/model_meta.py. If you are following the instructions for executing engines below, you may also need some sample images. Run the following script to download a few images from ImageNet.

source scripts/download_images.sh

Convert frozen graph to TensorRT engine

Run the scripts/convert_plan.py script from the root directory of the project, referencing the models table for relevant parameters. For example, to convert the Inception V1 model run the following

python scripts/convert_plan.py data/frozen_graphs/inception_v1.pb data/plans/inception_v1.plan input 224 224 InceptionV1/Logits/SpatialSqueeze 1 0 float

The inputs to the convert_plan.py script are

frozen graph path

output plan path

input node name

input height

input width

output node name

max batch size

max workspace size

data type (float or half)

This script assumes single output single input image models, and may not work out of the box for models other than those in the table above.

Execute TensorRT engine

Call the examples/classify_image program from the root directory of the project, referencing the models table for relevant parameters. For example, to run the Inception V1 model converted as above

./build/examples/classify_image/classify_image data/images/gordon_setter.jpg data/plans/inception_v1.plan data/imagenet_labels_1001.txt input InceptionV1/Logits/SpatialSqueeze inception

For reference, the inputs to the example program are

input image path

plan file path

labels file (one label per line, line number corresponds to index in output)

input node name

output node name

preprocessing function (either vgg or inception)

We provide two image label files in the data folder. Some of the TensorFlow models were trained with an additional "background" class, causing the model to have 1001 outputs instead of 1000. To determine the number of outputs for each model, reference the NETS variable in scripts/model_meta.py.

Benchmark all models

To benchmark all of the models, first convert all of the models that you downloaded above into TensorRT engines. Run the following script to convert all models

python scripts/frozen_graphs_to_plans.py

If you want to change parameters related to TensorRT optimization, just edit the scripts/frozen_graphs_to_plans.py file. Next, to benchmark all of the models run the scripts/test_trt.py script

python scripts/test_trt.py

Once finished, the timing results will be stored at data/test_output_trt.txt. If you want to also benchmark the TensorFlow models, simply run.

python scripts/test_tf.py

The results will be stored at data/test_output_tf.txt. This benchmarking script loads an example image as input, make sure you have downloaded the sample images as above.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值