tlt yolov3训练

训练

官方代码:

 "!tlt-train yolo -e $SPECS_DIR/yolo_train_resnet18_kitti.txt \\\n",
    "                -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \\\n",
    "                -k $KEY \\\n",
    "                -m $USER_EXPERIMENT_DIR/pretrained_resnet18/tlt_pretrained_object_detection_vresnet18/resnet_18.hdf5 \\\n",
    "                --gpus 1"

但是报错:

ValueError: Dimension 0 in both shapes must be equal, but are 3072 and 512. Shapes are [3072,176] and [512,176]. for 'Assign_557' (op: 'Assign') with input shapes: [3072,176], [512,176].
Traceback (most recent call last):
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo/utils/model_io.py", line 101, in load_model_as_pretrain
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py", line 1047, in set_weights
    str(weights)[:50] + '...')
ValueError: You called `set_weights(weights)` on layer "conv1" with a  weight list of length 2, but the layer was expecting 1 weights. Provided weights: [array([[[[-1.46843329e-01, -3.80116850e-02,  2.28...

During handling of the above exception, another exception occurred:
...
ValueError: Dimension 0 in both shapes must be equal, but are 3072 and 512. Shapes are [3072,176] and [512,176]. for 'Assign_557' (op: 'Assign') with input shapes: [3072,176], [512,176].
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[32056,1],0]
  Exit code:    1
--------------------------------------------------------------------------

开始怀疑是输入图片大小不对,改了之后还是报错。看网上有用tlt yolov3的,backbone用的darknet53,所以改成darknet19,再修改相应的配置文件

random_seed: 42
yolov3_config {
  big_anchor_shape: "[(114.94, 60.67), (159.06, 114.59), (297.59, 176.38)]"
  mid_anchor_shape: "[(42.99, 31.91), (79.57, 31.75), (56.80, 56.93)]"
  small_anchor_shape: "[(15.60, 13.88), (30.25, 20.25), (20.67, 49.63)]"
  matching_neutral_box_iou: 0.7
  arch: "darknet"
  nlayers: 19
  arch_conv_blocks: 2
  loss_loc_weight: 0.8
  loss_neg_obj_weights: 100.0
  loss_class_weights: 1.0
  freeze_bn: false
  #freeze_blocks: 0
  force_relu: false
}

训练成功:

Total params: 40,787,478
Trainable params: 40,755,318
Non-trainable params: 32,160
__________________________________________________________________________________________________
2021-11-03 10:34:38,060 [INFO] iva.yolo.scripts.train: Number of images in the training dataset:	 70594
Epoch 1/1
2021-11-03 10:34:48.559269: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-11-03 10:34:48.737791: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x76f5790
2021-11-03 10:34:48.737920: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-11-03 10:34:48.873716: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-11-03 10:34:48.874435: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-11-03 10:34:48.879691: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-11-03 10:34:49.059903: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x7473870
2021-11-03 10:34:49.060035: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-11-03 10:34:49.198278: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-11-03 10:34:49.199151: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7

测试

darknet19-yolov3训练出来的模型163M,进行推理测试卡的进行不下去(2个1080ti),后来到8个1080ti上测试才能成功。

tlt-evaluate yolo -e specs/yolo_train_darknet19_car.txt -m /workspace/tlt-experiments/backup-yolov3/weights/yolo_darknet19_epoch_015.tlt -k <key>
Producing predictions:   0%|                                                                                                   | 0/11491 [00:00<?, ?it/s]2021-11-04 02:53:11.116402: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-11-04 02:53:12.000667: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
Producing predictions: 100%|███████████████████████████████████████████████████████████████████████████████████████| 11491/11491 [07:00<00:00, 27.36it/s]
Start multi-thread per-image matching
Start to calculate AP for each class
*******************************
car           AP    0.878
              mAP   0.878
*******************************

裁剪

lt-prune -m /workspace/tlt-experiments/backup-yolov3/weights/yolo_darknet19_epoch_015.tlt -o /workspace/tlt-experiments/backup-yolov3/weights/yolo_darknet19_epoch_015-pruned.tlt -eq union -pth 0.3 -k <>

163M的模型,裁剪完就只剩2.9M
裁剪完2.9M的模型在2个1080Ti上测试,还是卡的进行不下去。。。

重新训练

tlt-train yolo -e specs/yolo_retrain_darknet19_car.txt -r /workspace/tlt-experiments/backup-yolov3/retrain_weigths -m /workspace/tlt-experiments/backup-yolov3/weights/yolo_darknet19_epoch_015-pruned.tlt -k <> --gpus 8

推理可视化

tlt-infer yolo -e specs/yolo_train_darknet19_car.txt -o /workspace/tlt-experiments/tlt_infer_testing -i /workspace/tlt-experiments/test_img/ -m /workspace/tlt-experiments/backup-yolov3/retrain_weigths/weights/yolo_darknet19_epoch_001.tlt -k <>

模型导出

tlt-export yolo -m /workspace/tlt-experiments/backup-yolov3/retrain_weigths/weights/yolo_darknet19_epoch_001.tlt -o /workspace/tlt-experiments/backup-yolov3/retrain_weigths/weights/yolo_darknet19_epoch_001.etlt -e specs/yolo_retrain_darknet19_car.txt -k <> --cal_image_dir /workspace/tlt-experiments/image_2 --data_type int8 --batch_size 16 --batches 10 --cal_cache_file /workspace/tlt-experiments/cal.bin --cal_data_file /workspace/tlt-experiments/cal.tensorfile

deepstream部署
soure1.txt

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=1

[tiled-display]
enable=1
rows=1
columns=1
width=800
height=544
gpu-id=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
num-sources=1
uri=file://../test1.mp4
gpu-id=0


[streammux]
gpu-id=0
batch-size=1
batched-push-timeout=40000
## Set muxer output width and height
width=800
height=544


[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=3
sync=1
source-id=0
gpu-id=0
container=2
codec=1
bitrate=2000000
output-file=../model/result_yolov3.mp4

[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial
[primary-gie]
enable=1
gpu-id=0
# Modify as necessary
# GPU engine file
# model-engine-file= ../model/resnet18_detector.etlt_b1_gpu0_int8.engine
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=0;1;0;1
bbox-border-color1=1;0;0;1
#bbox-border-color2=0;0;1;1 # Blue
#bbox-border-color3=0;1;0;1
gie-unique-id=1
config-file=config_infer_primary_car.txt

[tracker]
enable=0
tracker-width=640
tracker-height=384
#ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_iou.so
#ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_nvdcf.so
ll-lib-file=/opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_mot_klt.so
#ll-config-file required for DCF/IOU only
#ll-config-file=../deepstream-app/tracker_config.yml
#ll-config-file=iou_config.txt
gpu-id=0
#enable-batch-process applicable to DCF only
enable-batch-process=1

[tests]
file-loop=1

config_infer_primary_car.txt
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
tlt-model-key=cWtsZjE4OGtscGJuYW4xaHZwa2hpamtzZm06Mzk2YTcwMjAtNDM1ZS00YmI0LTlhNGItNDU3ZjIzZjkzNWJh
tlt-encoded-model=/home/nx/face_mask_det/face-mask-detection/model/weights/yolo_darknet19_epoch_001.etlt
#tlt-encoded-model=…/model/resnet18_detector.etlt

labelfile-path=labels_car.txt

# GPU Engine File
# model-engine-file=/home/nx/face_mask_det/face-mask-detection/model/model.step-17649.etlt_b1_gpu0_int8.engine
# DLA Engine File
# model-engine-file=/home/nvidia/detectnet_v2_models/detectnet_4K-fddb-12/resnet18_RGB960_detector_fddb_12_int8.etlt_b1_dla0_int8.engine
input-dims=3;544;800;0
uff-input-blob-name=input_1
batch-size=1
model-color-format=0
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=1
#int8-calib-file=../model/calibration.bin
num-detected-classes=2
cluster-mode=1
interval=0
gie-unique-id=1
network-type=0
classifier-threshold=0.9
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid

[class-attrs-0]
pre-cluster-threshold=0.6
group-threshold=1
eps=0.3
#minBoxes=1
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

label_car.txt

car
default
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值