深度学习计算框架TensorFlow训练不同网络方法&示例

python3 benchmark_cnn_tf_v1.14_compatible/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model=resnet50 --batch_size=32 --num_gpus=1 --num_epochs=90

fp16 train

python3 benchmark_cnn_tf_v1.14_compatible/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model=resnet50 --use_fp16=true --fp16_enable_auto_loss_scale=true --batch_size=32 --num_gpus=1 --num_epochs=90

fp32 inference

python3 benchmark_cnn_tf_v1.14_compatible/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model=resnet50 --batch_size=1 --num_gpus=1 --forward_only  --num_batches=500

fp16 inference

python3 benchmark_cnn_tf_v1.14_compatible/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --use_fp16=true --fp16_enable_auto_loss_scale=true --model=resnet50 --batch_size=1 --num_gpus=1 --forward_only --num_batches=500

大规模测试

单卡

HIP_VISIBLE_DEVICES=0 python3 tensorflow_synthetic_benchmark.py --model=ResNet50 --batch-size=128 --num-iters=500

多卡

mpirun -np ${num_gpu} --hostfile hostfile --bind-to none scripts-run/single_process.sh

参考资料

https://github.com/tensorflow/benchmarks/tree/cnn_tf_v1.14_compatible/scripts/tf_cnn_benchmarks https://github.com/horovod/horovod/tree/master/examples/tensorflow

Classification Acc

本测试用例用于图像分类ResNet50模型在ROCm平台的精度验证，测试流程如下。

加载环境变量

下载tensorflow官方github中的model
设置python变量：

export PYTHONPATH=$PYTHONPATH:/path/to/tensorflow/model

ROCm平台使用MIOpen进行加速，以下变量设置可以参考使用：

export MIOPEN_DEBUG_DISABLE_FIND_DB=1  

export MIOPEN_USER_DB_PATH=/path/to/{miopen_save_dir}  

export LD_LIBRARY_PATH=/path/to/devtoolset7:$LD_LIBRARY_PATH

运行示例

可以使用单卡或多卡运行，4卡运行指令如下：

cd official/resnet  

python3 imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4

参考

https://github.com/tensorflow/models/tree/r1.13.0/official/resnet

Objection-Faster-rcnn/SSD

目标检测程序，支持Faster-rcnn和SSD

环境部署

（1）升级 pip

pip3 install --upgrade pip -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

（2）pip3 install -r requirements.txt

protobuf
pillow
lxml
jupyter
matplotlib
Cython
contextlib2
gast

（3）安装 horovod

git clone --recusive https://github.com/ROCmSoftwarePlatform/horovod.git
HOROVOD_WITH_TENSORFLOW=1 python3 setup.py install

（4）安装 cocoapi

git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
make
cp -r pycocotools /path/to/Objection/research

（5）安装 slim

cd slim
python3 setup install

（6）升级 pandas

pip3 install --upgrade pandas -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

（7）Protobuf Compilation

wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip
unzip protobuf.zip
./bin/protoc /path/to/Objection/research/object_detection/protos/ *.proto --python_out=.

（8）添加 PYTHONPATH

export LD_LIBRARY_PATH=/public/home/tianlh/tool/devtoolset7:$LD_LIBRARY_PATH
export PYTHONPATH=$PYTHONPATH:/path/to/Objection/research:/path/to/Objection/research/slim
export LD_LIBRARY_PATH=/path/to/Objection/research/slim:$LD_LIBRARY_PATH
如遇python文件调用问题，清理缓存，重新设置环境变量
rm -rf ~/.cache/ *
rm -rf {MIOPEN_USER_DB_PATH}/ *

2、创建数据集

wget https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_coco_tf_record.py
python3 object_detection/dataset_tools/create_coco_tf_record.py --logtostderr \
--train_image_dir="/path/to/COCO2017/images/train2017" \
--val_image_dir="/path/to/COCO2017/images/val2017" \
--train_annotations_file="/path/to/COCO2017/annotations/instances_train2017.json" \
--val_annotations_file="/path/to/COCO2017/annotations/instances_val2017.json" \
--output_dir="/path/to/COCO2017-TF/"

3、下载预训练模型

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
wget *
tar -zxf *.tar.gz

4、确认数据集&预训练模型路径

vim samples/configs/ssd_inception_v2_coco.config 修改“PATH_TO_BE_CONFIGURED”

5、测试执行

SSD

单卡

python3 legacy/train.py --pipeline_config_path=samples/configs/ssd_inception_v2_coco.config --train_dir=result/ssd_inceptionV2_1gpu --num_clones=1 --ps_tasks=0 --alsologtostderr

4 卡

numactl --cpunodebind=0,1,2,3 --membind=0,1,2,3 python3 legacy/train.py --pipeline_config_path=samples/configs/ssd_inception_v2_coco.config --train_dir=result/ssd_inceptionV2_4gpu --num_clones=4 --ps_tasks=1 --alsologtostderr

Fastercnn

参数	解释	示例
PIPELINE_CONFIG_PATH	train config 路径	faster_rcnn_inception_v2_coco.config
CPKT_PATH	输出文件保存路径	-
NUM_CLONES	计算显卡数量	-
PS_TASKS	任务数	-

WORK_DIR=`pwd`
python3 ${WORK_DIR}/models/research/object_detection/legacy/train.py --pipeline_config_path=${PIPELINE_CONFIG_PATH} --train_dir=${CPKT_PATH} --num_clones=${NUM_CLONES} --ps_tasks=${PS_TASKS} --alsologtostderr

参考资料

https://github.com/tensorflow/models/tree/master/research/object_detection

Objection-Mask R-CNN

Tensorflow训练Mask R-CNN模型

环境准备

1）安装工具包

* rocm3.3环境安装tensorflow1.15  
* 安装pycocotools 
  pip3 install pycocotools -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com  
* 更新pandas 
  pip3 install -U pandas -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com  
* 安装dllogger 
  git clone --recursive https://github.com/NVIDIA/dllogger.git 
  python3 setup.py install

2）数据处理（train 和 val）

cd dataset/  
git clone http://github.com/tensorflow/models tf-models  
cd tf-models/research  
wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip protobuf.zip  
unzip protobuf.zip  
./bin/protoc object_detection/protos/.proto --python_out=.

返回dataset目录 vim create_coco_tf_record.py 注释掉310 316行

PYTHONPATH="tf-models:tf-models/research" python3 create_coco_tf_record.py \
  --logtostderr \
  --include_masks \
  --train_image_dir=/path/to/COCO2017/images/train2017 \
  --val_image_dir=/path/to/COCO2017/images/val2017 \
  --train_object_annotations_file=/path/to/COCO2017/annotations/instances_train2017.json \
  --val_object_annotations_file=/path/to/COCO2017/annotations/instances_val2017.json \
  --train_caption_annotations_file=/path/to/COCO2017/annotations/captions_train2017.json \
  --val_caption_annotations_file=/path/to/COCO2017/annotations/captions_val2017.json \
  --output_dir=coco2017_tfrecord

生成coco2017_tfrecord文件夹

3）预训练模型下载

生成的模型文件结构如下:

weights/
>mask-rcnn/1555659850/  
https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/saved_model.pb 
>>variables/  
https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/variables/variables.data-00000-of-00001  
https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/variables/variables.index  
>resnet/
>>extracted_from_maskrcnn/
>>resnet-nhwc-2018-02-07/  
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/checkpoint 
>>>model.ckpt-112603/  
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.data-00000-of-00001  
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.index  
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.meta  
>>resnet-nhwc-2018-10-14/

测试

单卡训练

python3 scripts/benchmark_training.py --gpus {1,4,8} --batch_size {2,4}  
python3 scripts/benchmark_training.py --gpus 1 --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights

多卡训练

python3 scripts/benchmark_training.py --gpus 2 --batch_size 4 --model_dir save_model_2dcu --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights

推理

python3 scripts/benchmark_inference.py --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights

参考资料

https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Segmentation/MaskRCNN

Objection-YOLOv3

YOLOv3, 目标检测网络的巅峰之作

测试流程

1）预训练模型/权重：

使用预训练模型COCO的权重

$ cd checkpoint  
$ wget https://github.com/YunYang1994/tensorflow-yolov3/releases/download/v1.0/yolov3_coco.tar.gz

若网速较慢，可手动下载上传至计算环境

$ tar -xvf yolov3_coco.tar.gz  
$ cd ..  
$ python3 convert_weight.py --train_from_coco

可能会安装python包

2）训练：

确认core/config.py中的参数batchsize、数据路径等

* 单卡运行： 
  python3 train.py 
* 多卡运行： 
  mpirun -np 2 -H localhost:2 python3 train_hvd.py 
* 多机运行： 
  mpirun -np 4 -H b02r1n02:2,b02r1n04:2 python3 train_hvd.py

3) 推理

$ python3 evaluate.py  
$ cd mAP  
$ python3 main.py -na

参考资料

https://github.com/YunYang1994/tensorflow-yolov3

Segmentation-Unet_Industrial

本用例用于图像分割Unet_Industrial模型在ROCM平台TensorFlow框架下的训练和推理benchmark测试，测试流程如下

测试流程

下载数据集

DAGM2007
数据结构如下
raw_images

private

Class1
Class2
......
Class10

public

Class1
Class1_def
......
Class6
Class6_def

zip_files

private

Class1.zip
Class2.zip
......
Class10.zip

public

Class1.zip
Class1_def.zip
......
Class6.zip
Class6_def.zip

运行指令

训练性能benchmark

./scripts/benchmarking/DGX1v_trainbench_{FP16, FP32, FP32AMP, FP32FM}_{1, 4, 8}GPU.sh <path to result repository> <path to dataset> <DAGM2007 classID (1-10)>

示例，使用Class 1进行单卡训练

./scripts/DGX1v_trainbench_FP32_4GPU.sh /path/to/{save_dir} /path/to/{DAGM2007_dir} 1

多卡运行时在DGX1v_trainbench_FP32_4GPU.sh中加入mpirun命令即可

推理性能benchmark

./scripts/benchmarking/DGX1v_evalbench_FP16_1GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

训练

./UNet_FP32_1GPU.sh <path to result repository> <path to dataset> <DAGM2007 classID (1-10)>

推理

./UNet_FP32_EVAL.sh <path to result repository> <path to dataset> <DAGM2007 classID (1-10)>

参考

https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Segmentation/UNet_Industrial

Segmentation-vnet

本用例用于图像分割VNet模型在ROCm平台的训练性能和推理性能的测试，已在rocm3.3 tensorflow1.1.5.0版本下进行验证。测试流程如下

测试流程

下载数据集

medical segmentation decathlon(MSD)

安装工具包

安装SimpleITK，下载whl包

wget https://files.pythonhosted.org/packages/f8/d8/53338c34f71020725ffb3557846c80af96c29c03bc883551a2565aa68a7c/SimpleITK-1.2.4-cp36-cp36m-manylinux1_x86_64.whl

运行指令

benchmark

单卡训练benchmark

python3 examples/vnet_benchmark.py \
--data_dir /path/to/{MSD_Task04_Hippocampus_dir} \
--model_dir /path/to/{model_save_dir} \
--mode train \
--gpus 1 \
--batch_size 8

4卡训练benchmark

python3 examples/vnet_benchmark.py \
--data_dir /path/to/{MSD_Task04_Hippocampus_dir} \
--model_dir /path/to/{model_save_dir} \
--mode train \
--gpus 4 \
--batch_size 32

推理benchmark

python3 examples/vnet_benchmark.py \  
--data_dir /path/to/{MSD_Task04_Hippocampus_dir} \  
--model_dir /path/to/{model_save_dir} \ 
--mode predict \  
--gpus 1 \  
--batch_size 8

训练示例

python3 examples/vnet_train.py \  
--data_dir /path/to/{MSD_Task04_Hippocampus_dir} \
--model_dir /path/to/{model_save_dir} \
--mode train \
--gpus 1 \
--batch_size 260 \
--epochs 1

推理示例

python3 examples/vnet_predict.py \  
--data_dir /path/to/{MSD_Task04_Hippocampus_dir} \
--model_dir /path/to/{model_save_dir} \
--batch_size 4

参考

https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Segmentation/VNet

技术瘾君子1573

关注

23
点赞
踩
26

收藏

觉得还不错? 一键收藏
打赏
0
评论
深度学习计算框架TensorFlow训练不同网络方法&示例

TensorFlow 框架训练图像分类相关网络的代码,tensorflow 官方基准测试程序，使用的数据集是 imagenet。
复制链接

扫一扫

专栏目录