目录
(2)pip3 install -r requirements.txt
以计算框架分类,分别介绍其训练不同网络的方法。
代码:ModelZoo / resnet50_tensorflow · GitLab
实例:光源 Find source, find chance.
Classification bench
- TensorFlow 框架 训练 图像分类相关网络的代码,tensorflow 官方基准测试程序,使用的数据集是 imagenet。
测试运行
- 测试代码分为两部分,基础性能测试和大规模性能测试。
基础 benchmark
- 创建 TensorFlow 运行时环境后,以 resnet50 网络为例,计算其 batch_size=32 num_gpu=1 条件下不同精度的性能,分为训练和推理两部分
fp32 train
python3 benchmark_cnn_tf_v1.14_compatible/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model=resnet50 --batch_size=32 --num_gpus=1 --num_epochs=90
fp16 train
python3 benchmark_cnn_tf_v1.14_compatible/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model=resnet50 --use_fp16=true --fp16_enable_auto_loss_scale=true --batch_size=32 --num_gpus=1 --num_epochs=90
fp32 inference
python3 benchmark_cnn_tf_v1.14_compatible/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model=resnet50 --batch_size=1 --num_gpus=1 --forward_only --num_batches=500
fp16 inference
python3 benchmark_cnn_tf_v1.14_compatible/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --use_fp16=true --fp16_enable_auto_loss_scale=true --model=resnet50 --batch_size=1 --num_gpus=1 --forward_only --num_batches=500
大规模测试
单卡
HIP_VISIBLE_DEVICES=0 python3 tensorflow_synthetic_benchmark.py --model=ResNet50 --batch-size=128 --num-iters=500
多卡
mpirun -np ${num_gpu} --hostfile hostfile --bind-to none scripts-run/single_process.sh
参考资料
https://github.com/tensorflow/benchmarks/tree/cnn_tf_v1.14_compatible/scripts/tf_cnn_benchmarks https://github.com/horovod/horovod/tree/master/examples/tensorflow
Classification Acc
本测试用例用于图像分类ResNet50模型在ROCm平台的精度验证,测试流程如下。
加载环境变量
下载tensorflow官方github中的model
设置python变量:
export PYTHONPATH=$PYTHONPATH:/path/to/tensorflow/model
ROCm平台使用MIOpen进行加速,以下变量设置可以参考使用:
export MIOPEN_DEBUG_DISABLE_FIND_DB=1
export MIOPEN_USER_DB_PATH=/path/to/{miopen_save_dir}
export LD_LIBRARY_PATH=/path/to/devtoolset7:$LD_LIBRARY_PATH
运行示例
可以使用单卡或多卡运行,4卡运行指令如下:
cd official/resnet
python3 imagenet_main.py --data_dir=/path/to/{ImageNet-tensorflow_data_dir} --model_dir=/path/to/{model_save_dir} --batch_size=512 --num_gpus=4
参考
https://github.com/tensorflow/models/tree/r1.13.0/official/resnet
Objection-Faster-rcnn/SSD
- 目标检测程序,支持Faster-rcnn和SSD
环境部署
(1)升级 pip
pip3 install --upgrade pip -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
(2)pip3 install -r requirements.txt
protobuf
pillow
lxml
jupyter
matplotlib
Cython
contextlib2
gast
(3)安装 horovod
git clone --recusive https://github.com/ROCmSoftwarePlatform/horovod.git
HOROVOD_WITH_TENSORFLOW=1 python3 setup.py install
(4)安装 cocoapi
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
make
cp -r pycocotools /path/to/Objection/research
(5)安装 slim
cd slim
python3 setup install
(6)升级 pandas
pip3 install --upgrade pandas -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
(7)Protobuf Compilation
wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip
unzip protobuf.zip
./bin/protoc /path/to/Objection/research/object_detection/protos/ *.proto --python_out=.
(8)添加 PYTHONPATH
export LD_LIBRARY_PATH=/public/home/tianlh/tool/devtoolset7:$LD_LIBRARY_PATH
export PYTHONPATH=$PYTHONPATH:/path/to/Objection/research:/path/to/Objection/research/slim
export LD_LIBRARY_PATH=/path/to/Objection/research/slim:$LD_LIBRARY_PATH
如遇python文件调用问题,清理缓存,重新设置环境变量
rm -rf ~/.cache/ *
rm -rf {MIOPEN_USER_DB_PATH}/ *
2、创建数据集
wget https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_coco_tf_record.py
python3 object_detection/dataset_tools/create_coco_tf_record.py --logtostderr \
--train_image_dir="/path/to/COCO2017/images/train2017" \
--val_image_dir="/path/to/COCO2017/images/val2017" \
--train_annotations_file="/path/to/COCO2017/annotations/instances_train2017.json" \
--val_annotations_file="/path/to/COCO2017/annotations/instances_val2017.json" \
--output_dir="/path/to/COCO2017-TF/"
3、下载预训练模型
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
wget *
tar -zxf *.tar.gz
4、确认数据集&预训练模型路径
vim samples/configs/ssd_inception_v2_coco.config 修改“PATH_TO_BE_CONFIGURED”
5、测试执行
SSD
单卡
python3 legacy/train.py --pipeline_config_path=samples/configs/ssd_inception_v2_coco.config --train_dir=result/ssd_inceptionV2_1gpu --num_clones=1 --ps_tasks=0 --alsologtostderr
4 卡
numactl --cpunodebind=0,1,2,3 --membind=0,1,2,3 python3 legacy/train.py --pipeline_config_path=samples/configs/ssd_inception_v2_coco.config --train_dir=result/ssd_inceptionV2_4gpu --num_clones=4 --ps_tasks=1 --alsologtostderr
Fastercnn
参数 | 解释 | 示例 |
---|---|---|
PIPELINE_CONFIG_PATH | train config 路径 | faster_rcnn_inception_v2_coco.config |
CPKT_PATH | 输出文件保存路径 | - |
NUM_CLONES | 计算显卡数量 | - |
PS_TASKS | 任务数 | - |
WORK_DIR=`pwd`
python3 ${WORK_DIR}/models/research/object_detection/legacy/train.py --pipeline_config_path=${PIPELINE_CONFIG_PATH} --train_dir=${CPKT_PATH} --num_clones=${NUM_CLONES} --ps_tasks=${PS_TASKS} --alsologtostderr
参考资料
https://github.com/tensorflow/models/tree/master/research/object_detection
Objection-Mask R-CNN
- Tensorflow训练Mask R-CNN模型
环境准备
1)安装工具包
-
* rocm3.3环境安装tensorflow1.15 * 安装pycocotools pip3 install pycocotools -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com * 更新pandas pip3 install -U pandas -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com * 安装dllogger git clone --recursive https://github.com/NVIDIA/dllogger.git python3 setup.py install
2)数据处理(train 和 val)
cd dataset/
git clone http://github.com/tensorflow/models tf-models
cd tf-models/research
wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip protobuf.zip
unzip protobuf.zip
./bin/protoc object_detection/protos/.proto --python_out=.
返回dataset目录 vim create_coco_tf_record.py
注释掉310 316行
PYTHONPATH="tf-models:tf-models/research" python3 create_coco_tf_record.py \
--logtostderr \
--include_masks \
--train_image_dir=/path/to/COCO2017/images/train2017 \
--val_image_dir=/path/to/COCO2017/images/val2017 \
--train_object_annotations_file=/path/to/COCO2017/annotations/instances_train2017.json \
--val_object_annotations_file=/path/to/COCO2017/annotations/instances_val2017.json \
--train_caption_annotations_file=/path/to/COCO2017/annotations/captions_train2017.json \
--val_caption_annotations_file=/path/to/COCO2017/annotations/captions_val2017.json \
--output_dir=coco2017_tfrecord
生成coco2017_tfrecord文件夹
3)预训练模型下载
生成的模型文件结构如下:
weights/
>mask-rcnn/1555659850/
https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/saved_model.pb
>>variables/
https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/variables/variables.data-00000-of-00001
https://storage.googleapis.com/cloud-tpu-checkpoints/mask-rcnn/1555659850/variables/variables.index
>resnet/
>>extracted_from_maskrcnn/
>>resnet-nhwc-2018-02-07/
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/checkpoint
>>>model.ckpt-112603/
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.data-00000-of-00001
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.index
https://storage.googleapis.com/cloud-tpu-checkpoints/retinanet/resnet50-checkpoint-2018-02-07/model.ckpt-112603.meta
>>resnet-nhwc-2018-10-14/
测试
单卡训练
python3 scripts/benchmark_training.py --gpus {1,4,8} --batch_size {2,4}
python3 scripts/benchmark_training.py --gpus 1 --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights
多卡训练
python3 scripts/benchmark_training.py --gpus 2 --batch_size 4 --model_dir save_model_2dcu --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights
推理
python3 scripts/benchmark_inference.py --batch_size 2 --model_dir save_model --data_dir /public/home/tianlh/AI-application/Tensorflow/MaskRCNN_tf2/dataset/coco2017_tfrecord --weights_dir weights
参考资料
https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Segmentation/MaskRCNN
Objection-YOLOv3
- YOLOv3, 目标检测网络的巅峰之作
测试流程
1)预训练模型/权重:
使用预训练模型COCO的权重
$ cd checkpoint
$ wget https://github.com/YunYang1994/tensorflow-yolov3/releases/download/v1.0/yolov3_coco.tar.gz
若网速较慢,可手动下载上传至计算环境
$ tar -xvf yolov3_coco.tar.gz
$ cd ..
$ python3 convert_weight.py --train_from_coco
可能会安装python包
2)训练:
确认core/config.py中的参数batchsize、数据路径等
-
* 单卡运行: python3 train.py * 多卡运行: mpirun -np 2 -H localhost:2 python3 train_hvd.py * 多机运行: mpirun -np 4 -H b02r1n02:2,b02r1n04:2 python3 train_hvd.py
3) 推理
$ python3 evaluate.py
$ cd mAP
$ python3 main.py -na
参考资料
https://github.com/YunYang1994/tensorflow-yolov3
Segmentation-Unet_Industrial
本用例用于图像分割Unet_Industrial模型在ROCM平台TensorFlow框架下的训练和推理benchmark测试,测试流程如下
测试流程
下载数据集
DAGM2007
数据结构如下
raw_images
private
Class1
Class2
......
Class10public
Class1
Class1_def
......
Class6
Class6_def
zip_files
private
Class1.zip
Class2.zip
......
Class10.zippublic
Class1.zip
Class1_def.zip
......
Class6.zip
Class6_def.zip
运行指令
训练性能benchmark
./scripts/benchmarking/DGX1v_trainbench_{FP16, FP32, FP32AMP, FP32FM}_{1, 4, 8}GPU.sh <path to result repository> <path to dataset> <DAGM2007 classID (1-10)>
示例,使用Class 1进行单卡训练
./scripts/DGX1v_trainbench_FP32_4GPU.sh /path/to/{save_dir} /path/to/{DAGM2007_dir} 1
多卡运行时在DGX1v_trainbench_FP32_4GPU.sh中加入mpirun命令即可
推理性能benchmark
./scripts/benchmarking/DGX1v_evalbench_FP16_1GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
训练
./UNet_FP32_1GPU.sh <path to result repository> <path to dataset> <DAGM2007 classID (1-10)>
推理
./UNet_FP32_EVAL.sh <path to result repository> <path to dataset> <DAGM2007 classID (1-10)>
参考
https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Segmentation/UNet_Industrial
Segmentation-vnet
本用例用于图像分割VNet模型在ROCm平台的训练性能和推理性能的测试,已在rocm3.3 tensorflow1.1.5.0版本下进行验证。测试流程如下
测试流程
下载数据集
medical segmentation decathlon(MSD)
安装工具包
安装SimpleITK,下载whl包
wget https://files.pythonhosted.org/packages/f8/d8/53338c34f71020725ffb3557846c80af96c29c03bc883551a2565aa68a7c/SimpleITK-1.2.4-cp36-cp36m-manylinux1_x86_64.whl
运行指令
benchmark
单卡训练benchmark
python3 examples/vnet_benchmark.py \
--data_dir /path/to/{MSD_Task04_Hippocampus_dir} \
--model_dir /path/to/{model_save_dir} \
--mode train \
--gpus 1 \
--batch_size 8
4卡训练benchmark
python3 examples/vnet_benchmark.py \
--data_dir /path/to/{MSD_Task04_Hippocampus_dir} \
--model_dir /path/to/{model_save_dir} \
--mode train \
--gpus 4 \
--batch_size 32
推理benchmark
python3 examples/vnet_benchmark.py \
--data_dir /path/to/{MSD_Task04_Hippocampus_dir} \
--model_dir /path/to/{model_save_dir} \
--mode predict \
--gpus 1 \
--batch_size 8
训练示例
python3 examples/vnet_train.py \
--data_dir /path/to/{MSD_Task04_Hippocampus_dir} \
--model_dir /path/to/{model_save_dir} \
--mode train \
--gpus 1 \
--batch_size 260 \
--epochs 1
推理示例
python3 examples/vnet_predict.py \
--data_dir /path/to/{MSD_Task04_Hippocampus_dir} \
--model_dir /path/to/{model_save_dir} \
--batch_size 4
参考
https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Segmentation/VNet