README

C3D模型是一种用于3D视觉任务的网络,类似于2D卷积网络但使用3D操作。文章详细介绍了C3D模型的结构、UCF101数据集、训练和评估过程,以及在Ascend和GPU上的分布式训练设置。还提供了训练和评估的脚本参数以及性能指标。
摘要由CSDN通过智能技术生成

Contents

C3D Description

C3D model is widely used for 3D vision task. The construct of C3D network is similar to the common 2D ConvNets, the main difference is that C3D use 3D operations like Conv3D while 2D ConvNets are anentirely 2D architecture. To know more information about C3D network, you can read the original paper Learning Spatiotemporal Features with 3D Convolutional Networks.

Model Architecture

C3D net has 8 convolution, 5 max-pooling, and 2 fully connected layers, followed by a softmax output layer. All 3D convolution kernels are 3 × 3 × 3 with stride 1 in both spatial and temporal dimensions. The 3D pooling layers are denoted from pool1 to pool5. All pooling kernels are 2 × 2 × 2, except for pool1 is 1 × 2 × 2. Each fully connected layer has 4096 output units.

Dataset

Dataset used: UCF101

  • Description: UCF101 is an action recognition data set of realistic action videos, collected from YouTube, having 101 action categories. This data set is an extension of UCF50 data set which has 50 action categories.

  • Dataset size:13320 videos

  • Data format:rar

    • Note:Data will be processed in dataset_preprocess.py
  • Data Content Structure

.
└─ucf101                                    // contains 101 file folder
  ├── ApplyEyeMakeup                        // contains 145 videos
  │   ├── v_ApplyEyeMakeup_g01_c01.avi      // video file
  │   ├── v_ApplyEyeMakeup_g01_c02.avi      // video file
  │    ...
  ├── ApplyLipstick                         // contains 114 image files
  │   ├── v_ApplyLipstick_g01_c01.avi       // video file
  │   ├── v_ApplyLipstick_g01_c02.avi       // video file
  │    ...
  ├── ucfTrainTestlist                      // contains category files
  │   ├── classInd.txt                      // Category file.
  │   ├── testlist01.txt                    // split file
  │   ├── trainlist01.txt                   // split file
  ...

Environment Requirements

Quick Start

After installing MindSpore via the official website, you can start training and evaluation as follows:

  • Processing raw data files
# Convert video into image.
bash run_dataset_preprocess.sh UCF101 [RAR_FILE_PATH] 1

# for example: bash run_dataset_preprocess.sh UCF101 /Data/UCF101/UCF101.rar 1

Refer to c3d.yaml. We support some parameter configurations for quick start.

  • Run on Ascend
cd scripts
# run training example
bash run_standalone_train_ascend.sh
# run distributed training example
bash run_distribute_train_ascend.sh [RANK_TABLE_FILE]
# run evaluation example
bash run_standalone_eval_ascend.sh [CKPT_FILE_PATH]
  • Run on GPU
cd scripts
# run training example
bash run_standalone_train_gpu.sh [CONFIG_PATH] [DEVICE_ID]
# run distributed training example
bash run_distribute_train_gpu.sh [CONFIG_PATH]
# run evaluation example
bash run_standalone_eval_gpu.sh [CKPT_FILE_PATH] [CONFIG_PATH]

Script Description

Script and Sample Code

.
└─c3d_mindspore
  ├── README.md                           // descriptions about C3D
  ├── scripts
  │   ├──run_dataset_preprocess.sh       // shell script for preprocessing dataset
  │   ├──run_ckpt_convert.sh             // shell script for converting pytorch ckpt file to pickle file on GPU
  │   ├──run_distribute_train_ascend.sh  // shell script for distributed training on Ascend
  │   ├──run_distribute_train_gpu.sh  // shell script for distributed training on GPU
  │   ├──run_infer_310.sh                // shell script for inference on 310
  │   ├──run_standalone_train_ascend.sh  // shell script for training on Ascend
  │   ├──run_standalone_train_gpu.sh  // shell script for training on GPU
  │   ├──run_standalone_eval_ascend.sh   // shell script for testing on Ascend
  │   ├──run_standalone_eval_gpu.sh   // shell script for testing on GPU
  ├── src
  │
  │   ├──dataset.py                    // creating dataset
  │   ├──evalcallback.py               // evalcallback
  │   ├──lr_schedule.py                // learning rate scheduler
  │   ├──transform.py                  // handle dataset
  │   ├──loss.py                       // loss
  │   ├──utils.py                      // General components (callback function)
  │   ├──c3d_model.py                  // Unet3D model
          ├── utils
          │   ├──config.py             // parameter configuration
          │   ├──resized_mean.py     // device adapter
          |   |--dataset_preprocess.py
          │   ...
          ├── tools
          │   ├──ckpt_convert.py       // convert pytorch ckpt file to pickle file
          │   ├── // preprocess dataset
  ├── requirements.txt                 // requirements configuration
  ├── export.py                        // convert mindspore ckpt file to MINDIR file
  ├── train.py                         // evaluation script
  ├── infer.py                         // training script

Script Parameters

Parameters for both training and evaluation can be set in c3d.yaml

Parameters for both training and evaluation can be set in c3d_gpu.yaml

  • config for C3D, UCF101 dataset
# ==============================================================================
# model architecture
model_name: "c3d"   # model name 

#global config
device_target: "GPU"
dataset_sink_mode: False
context:            # runtime
    mode: 0         #0--Graph Mode; 1--Pynative Mode
    device_target: "GPU"
    save_graphs: False
    device_id: 3

# model settings of every parts
model:
    type: C3D
    in_d: 16
    in_h: 112
    in_w: 112
    in_channel: 3
    kernel_size: [3, 3, 3]
    head_channel: [4096, 4096]
    num_classes: 101
    keep_prob: [0.5, 0.5, 1.0]

# learning rate for training process
learning_rate:
    lr_scheduler: "exponential"
    lr: 0.003
    lr_epochs: [15, 30, 75]
    steps_per_epoch: 596
    warmup_epochs: 1
    max_epoch: 150
    lr_gamma: 0.1

# optimizer for training process
optimizer:
    type: 'SGD'
    momentum: 0.9
    weight_decay: 0.0005
    loss_scale: 1.0

# train loss
loss:       
    type: SoftmaxCrossEntropyWithLogits
    sparse: True
    reduction: "mean"

# trainning setups, including pretrain model
train:       
    pre_trained: False
    pretrained_model: ""
    ckpt_path: "./output/"
    epochs: 150
    save_checkpoint_epochs: 5
    save_checkpoint_steps: 1875
    keep_checkpoint_max: 30
    run_distribute: False

# evaluation setups
eval:
    pretrained_model: ".vscode/ms_ckpts/c3d_20220912.ckpt"

# infer setups
infer:
    pretrained_model: ".vscode/ms_ckpts/c3d_20220912.ckpt"
    batch_size: 1
    image_path: ""
    normalize: True
    output_dir: "./infer_output"

# export model into ckpt in other format  
export:       
    pretrained_model: ""
    batch_size: 64
    image_height: 112
    image_width: 112
    input_channel: 3
    file_name: "c3d"
    file_formate: "MINDIR"

# dataloader and data augmentation setups
data_loader:
    train:
        dataset:
            type: UCF101
            path: "/home/publicfile/UCF101_splits"  # Path to data root dir
            split: "train"
            batch_size: 16
            seq: 16
            seq_mode: "average"
            num_parallel_workers: 6
            shuffle: True
        map:        # data augmentation
            operations:
                - type: VideoResize
                  size: [128, 171]
                - type: VideoRescale
                  shift: "src/example/c3d/resized_mean_sports1m.npy" # mean file
                - type: VideoRandomCrop
                  size: [112, 112]
                - type: VideoRandomHorizontalFlip
                  prob: 0.5
                - type: VideoReOrder
                  order: [3, 0, 1, 2]
            input_columns: ["video"]

    eval:
        dataset:
            type: UCF101
            path: "/home/publicfile/UCF101_splits"  # Path to data root dir
            split: "test"
            batch_size: 16
            seq: 16
            seq_mode: "average"
            num_parallel_workers: 1
            shuffle: False
        map:
            operations:
                - type: VideoResize
                  size: [128, 171]
                - type: VideoRescale
                  shift: "src/example/c3d/resized_mean_sports1m.npy"  # mean file
                - type: VideoCenterCrop
                  size: [112, 112]
                - type: VideoReOrder
                  order: [3, 0, 1, 2]
            input_columns: ["video"]
    group_size: 1
# ==============================================================================

Training Process

Training

Training on Ascend
# enter scripts directory
cd scripts
# training
bash run_standalone_train_ascend.sh

The python command above will run in the background, you can view the results through the file eval.log.

After training, you’ll get some checkpoint files under the script folder by default. The loss value will be achieved as follows:

  • train.log for HMDB51
epoch: 1 step: 223, loss is 2.8705792
epoch time: 74139.530 ms, per step time: 332.464 ms
epoch: 2 step: 223, loss is 1.8403366
epoch time: 60084.907 ms, per step time: 269.439 ms
epoch: 3 step: 223, loss is 1.4866445
epoch time: 61095.684 ms, per step time: 273.972 ms
...
epoch: 29 step: 223, loss is 0.3037338
epoch time: 60436.915 ms, per step time: 271.018 ms
epoch: 30 step: 223, loss is 0.2176594
epoch time: 60130.695 ms, per step time: 269.644 ms
  • train.log for UCF101
epoch: 1 step: 596, loss is 0.53118783
epoch time: 170693.634 ms, per step time: 286.399 ms
epoch: 2 step: 596, loss is 0.51934457
epoch time: 150388.783 ms, per step time: 252.330 ms
epoch: 3 step: 596, loss is 0.07241724
epoch time: 151548.857 ms, per step time: 254.277 ms
...
epoch: 29 step: 596, loss is 0.034661677
epoch time: 150932.542 ms, per step time: 253.243 ms
epoch: 30 step: 596, loss is 0.0048465515
epoch time: 150760.797 ms, per step time: 252.954 ms
Training on GPU

Notes:If you occur a problem with the information:
“Bad performance attention, it takes more than 25 seconds to fetch and send a batch of data into device, which might result GetNext timeout problem.“
Please change the Parameter “dataset_sink_mode” to False

# enter scripts directory
cd scripts
# training
bash run_standalone_train_gpu.sh [CONFIG_PATH] [DEVICE_ID]

The above shell script will run distribute training in the background. You can view the results through the file ./train[X].log. The loss value will be achieved as follows:

  • train.log for UCF101
epoch: 1 step: 1192, loss is 0.8381556
epoch time: 593197.024 ms, per step time: 301.297 ms
epoch: 2 step: 1192, loss is 0.5701107
epoch time: 576058.976 ms, per step time: 260.542 ms
epoch: 3 step: 1192, loss is 0.1724325
epoch time: 578041.281 ms, per step time: 235.868 ms
...
epoch: 99 step: 1192, loss is 6.3519354e-05
epoch time: 573493.252 ms, per step time: 225.237 ms
epoch: 100 step: 1192, loss is 4.852382e-05
epoch time: 575237.743 ms, per step time: 229.164 ms

Distributed Training

Distributed training on Ascend

Notes:
RANK_TABLE_FILE can refer to Link , and the device_ip can be got as Link. For large models like InceptionV4, it’s better to export an external environment variable export HCCL_CONNECT_TIMEOUT=600 to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size.

# enter scripts directory
cd scripts
# distributed training
bash run_distribute_train_ascend.sh [RANK_TABLE_FILE]

The above shell script will run distribute training in the background. You can view the results through the file ./train[X].log. The loss value will be achieved as follows:

  • train0.log for UCF101
epoch: 1 step: 596, loss is 0.51830626
epoch time: 82401.300 ms, per step time: 138.257 ms
epoch: 2 step: 596, loss is 0.5527372
epoch time: 30820.129 ms, per step time: 51.712 ms
epoch: 3 step: 596, loss is 0.007791209
epoch time: 30809.803 ms, per step time: 51.694 ms
...
epoch: 29 step: 596, loss is 7.510604e-05
epoch time: 30809.334 ms, per step time: 51.694 ms
epoch: 30 step: 596, loss is 0.13138217
epoch time: 30819.966 ms, per step time: 51.711 ms
Distributed training on GPU
# enter scripts directory
cd scripts
# distributed training
vim run_distribute_train_gpu.sh to set start_device_id
bash run_distribute_train_gpu.sh [CONFIG_PATH]
  • train_distributed.log for UCF101
epoch: 1 step: 149, loss is 0.97137051820755
epoch: 1 step: 149, loss is 1.1462825536727905
epoch: 1 step: 149, loss is 1.484191656112671
epoch: 1 step: 149, loss is 0.639738142490387
epoch: 1 step: 149, loss is 1.1133722066879272
epoch: 1 step: 149, loss is 1.5043989419937134
epoch: 1 step: 149, loss is 1.2063453197479248
epoch: 1 step: 149, loss is 1.3174564838409424
epoch time: 183002.444 ms, per step time: 1228.204 ms
epoch time: 183388.214 ms, per step time: 1230.793 ms
epoch time: 183560.571 ms, per step time: 1231.950 ms
epoch time: 183881.357 ms, per step time: 1234.103 ms
epoch time: 184225.004 ms, per step time: 1236.409 ms
epoch time: 184383.710 ms, per step time: 1237.475 ms
epoch time: 184501.011 ms, per step time: 1238.262 ms
epoch time: 184885.520 ms, per step time: 1240.842 ms
epoch: 2 step: 149, loss is 0.10039880871772766
epoch: 2 step: 149, loss is 0.5981963276863098
epoch: 2 step: 149, loss is 0.4604840576648712
epoch: 2 step: 149, loss is 0.215419739484787
epoch: 2 step: 149, loss is 0.2556331753730774
epoch: 2 step: 149, loss is 0.03653889149427414
epoch: 2 step: 149, loss is 1.4467300176620483
epoch: 2 step: 149, loss is 1.0422033071517944
epoch time: 53143.686 ms, per step time: 356.669 ms
epoch time: 52175.739 ms, per step time: 350.173 ms
epoch time: 54300.036 ms, per step time: 364.430 ms
epoch time: 53026.808 ms, per step time: 355.885 ms
epoch time: 52941.203 ms, per step time: 355.310 ms
epoch time: 53144.090 ms, per step time: 356.672 ms
epoch time: 53896.009 ms, per step time: 361.718 ms
epoch time: 53584.895 ms, per step time: 359.630 ms
...

Evaluation Process

Evaluation

Evaluating on Ascend
  • evaluation on dataset when running on Ascend

Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., “username/ckpt_0/c3d-hmdb51-0-30_223.ckpt”.

# enter scripts directory
cd scripts
# eval
bash run_standalone_eval_ascend.sh [CKPT_FILE_PATH]

The above python command will run in the background. You can view the results through the file “eval.log”. The accuracy of the test dataset will be as follows:

  • eval.log for UCF101
start create network
pre_trained model: username/ckpt_0/c3d-ucf101-0-30_596.ckpt
setep: 1/237, acc: 0.625
setep: 21/237, acc: 1.0
setep: 41/237, acc: 0.5625
setep: 61/237, acc: 1.0
setep: 81/237, acc: 0.6875
setep: 101/237, acc: 1.0
setep: 121/237, acc: 0.5625
setep: 141/237, acc: 0.5
setep: 161/237, acc: 1.0
setep: 181/237, acc: 1.0
setep: 201/237, acc: 0.75
setep: 221/237, acc: 1.0
eval result: top_1 79.381%
Evaluating on GPU
  • evaluation on dataset when running on GPU

Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., “./results/xxxx-xx-xx_time_xx_xx_xx/ckpt_0/0-30_223.ckpt”.

# enter scripts directory
cd scripts
# eval
bash run_standalone_eval_gpu.sh [CKPT_FILE_PATH] [CONFIG_PATH]
  • eval.log for UCF101
start create network
pre_trained model: ./results/2021-11-02_time_07_30_42/ckpt_0/0-85_223.ckpt
setep: 1/237, acc: 0.75
setep: 21/237, acc: 1.0
setep: 41/237, acc: 0.625
setep: 61/237, acc: 1.0
setep: 81/237, acc: 0.875
setep: 101/237, acc: 1.0
setep: 121/237, acc: 0.9375
setep: 141/237, acc: 0.5625
setep: 161/237, acc: 1.0
setep: 181/237, acc: 1.0
setep: 201/237, acc: 0.5625
setep: 221/237, acc: 1.0
eval result: top_1 80.412%

Inference Process

Export MindIR

python export.py --ckpt_file [CKPT_PATH] --mindir_file_name [FILE_NAME] --file_format [FILE_FORMAT] --num_classes [NUM_CLASSES] --batch_size [BATCH_SIZE]
  • ckpt_file parameter is mandotory.
  • file_format should be in [“AIR”, “MINDIR”].
  • NUM_CLASSES Number of total classes in the dataset, 51 for HMDB51 and 101 for UCF101.
  • BATCH_SIZE Since currently mindir does not support dynamic shapes, this network only supports inference with batch_size of 1.

Infer on Ascend310

Before performing inference, the mindir file must be exported by export.py script. We only provide an example of inference using MINDIR model.

# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATASET] [NEED_PREPROCESS] [DEVICE_ID]
  • DATASET must be ‘HMDB51’ or ‘UCF101’.
  • NEED_PREPROCESS means weather need preprocess or not, it’s value is ‘y’ or ‘n’.
  • DEVICE_ID is optional, default value is 0.

result

Inference result is saved in current path, you can find result like this in acc.log file.

visulization

请添加图片描述

Model Description

Performance

Evaluation Performance
  • C3D for UCF101
ParametersAscendGPU
Model VersionC3DC3D
ResourceAscend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8V100
uploaded Date09/22/2021 (month/day/year)11/06/2021 (month/day/year)
MindSpore Version1.2.01.5.0
DatasetUCF101UCF101
Training Parametersepoch = 30, batch_size = 16epoch = 150, batch_size = 8
OptimizerSGDSGD
Loss FunctionMax_SoftmaxCrossEntropyWithLogitsMax_SoftmaxCrossEntropyWithLogits
Speed1pc: 253.372ms/step1pc:237.128ms/step
Top_11pc: 80.33%1pc:80.138%
Total time1pc: 1.31hours1pc:4hours
Parameters (M)7878
MethodAccuracy (%)
Imagenet + linear SVM68.8
iDT w/ BoW + linear SVM76.2
Deep networks65.4
Spatial stream network72.6
LRCN71.1
LSTM composite model75.8
C3D (1 net) + linear SVM82.3
C3D (3 nets) + linear SVM85.2
iDT w/ Fisher vector87.9
Temporal stream network83.7
Two-stream networks88.0
LRCN82.9
LSTM composite model84.3
Conv. pooling on long clips 、88.2
LSTM on long clips88.6
Multi-skip feature stacking89.1
C3D (3 nets) + iDT + linear SVM90.4

Original paper of C3D:Action recognition results on UCF101. C3D compared with baselines and current state-of-the-art methods. specially,they use C3D+SVM to get higher result than C3D only.

Description of Random Situation

We set random seed to 666 in default_config.yaml and default_config_gpu.yaml

ModelZoo Homepage

Please check the official homepage.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值