README

最新推荐文章于 2024-07-10 11:32:19 发布

深埋罐头的章鱼

最新推荐文章于 2024-07-10 11:32:19 发布

阅读量155

点赞数

文章标签：人工智能计算机视觉算法

本文链接：https://blog.csdn.net/weixin_42404050/article/details/129018215

版权

C3D模型是一种用于3D视觉任务的网络，类似于2D卷积网络但使用3D操作。文章详细介绍了C3D模型的结构、UCF101数据集、训练和评估过程，以及在Ascend和GPU上的分布式训练设置。还提供了训练和评估的脚本参数以及性能指标。

摘要由CSDN通过智能技术生成

Contents

C3D Description

C3D model is widely used for 3D vision task. The construct of C3D network is similar to the common 2D ConvNets, the main difference is that C3D use 3D operations like Conv3D while 2D ConvNets are anentirely 2D architecture. To know more information about C3D network, you can read the original paper Learning Spatiotemporal Features with 3D Convolutional Networks.

Model Architecture

C3D net has 8 convolution, 5 max-pooling, and 2 fully connected layers, followed by a softmax output layer. All 3D convolution kernels are 3 × 3 × 3 with stride 1 in both spatial and temporal dimensions. The 3D pooling layers are denoted from pool1 to pool5. All pooling kernels are 2 × 2 × 2, except for pool1 is 1 × 2 × 2. Each fully connected layer has 4096 output units.

Dataset

Dataset used: UCF101

Description: UCF101 is an action recognition data set of realistic action videos, collected from YouTube, having 101 action categories. This data set is an extension of UCF50 data set which has 50 action categories.
Dataset size：13320 videos
- Note：Use the official Train/Test Splits(UCF101TrainTestSplits).
Data format：rar
- Note：Data will be processed in dataset_preprocess.py
Data Content Structure

.
└─ucf101                                    // contains 101 file folder
  ├── ApplyEyeMakeup                        // contains 145 videos
  │   ├── v_ApplyEyeMakeup_g01_c01.avi      // video file
  │   ├── v_ApplyEyeMakeup_g01_c02.avi      // video file
  │    ...
  ├── ApplyLipstick                         // contains 114 image files
  │   ├── v_ApplyLipstick_g01_c01.avi       // video file
  │   ├── v_ApplyLipstick_g01_c02.avi       // video file
  │    ...
  ├── ucfTrainTestlist                      // contains category files
  │   ├── classInd.txt                      // Category file.
  │   ├── testlist01.txt                    // split file
  │   ├── trainlist01.txt                   // split file
  ...

Environment Requirements

Hardware
- Prepare hardware environment with Ascend or GPU(Nvidia).
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Quick Start

After installing MindSpore via the official website, you can start training and evaluation as follows:

Processing raw data files

# Convert video into image.
bash run_dataset_preprocess.sh UCF101 [RAR_FILE_PATH] 1

# for example: bash run_dataset_preprocess.sh UCF101 /Data/UCF101/UCF101.rar 1

Download pretrained model from c3d.ckpt

Refer to c3d.yaml. We support some parameter configurations for quick start.

Run on Ascend

cd scripts
# run training example
bash run_standalone_train_ascend.sh
# run distributed training example
bash run_distribute_train_ascend.sh [RANK_TABLE_FILE]
# run evaluation example
bash run_standalone_eval_ascend.sh [CKPT_FILE_PATH]

Run on GPU

cd scripts
# run training example
bash run_standalone_train_gpu.sh [CONFIG_PATH] [DEVICE_ID]
# run distributed training example
bash run_distribute_train_gpu.sh [CONFIG_PATH]
# run evaluation example
bash run_standalone_eval_gpu.sh [CKPT_FILE_PATH] [CONFIG_PATH]

Script Description

Script and Sample Code

.
└─c3d_mindspore
  ├── README.md                           // descriptions about C3D
  ├── scripts
  │   ├──run_dataset_preprocess.sh       // shell script for preprocessing dataset
  │   ├──run_ckpt_convert.sh             // shell script for converting pytorch ckpt file to pickle file on GPU
  │   ├──run_distribute_train_ascend.sh  // shell script for distributed training on Ascend
  │   ├──run_distribute_train_gpu.sh  // shell script for distributed training on GPU
  │   ├──run_infer_310.sh                // shell script for inference on 310
  │   ├──run_standalone_train_ascend.sh  // shell script for training on Ascend
  │   ├──run_standalone_train_gpu.sh  // shell script for training on GPU
  │   ├──run_standalone_eval_ascend.sh   // shell script for testing on Ascend
  │   ├──run_standalone_eval_gpu.sh   // shell script for testing on GPU
  ├── src
  │
  │   ├──dataset.py                    // creating dataset
  │   ├──evalcallback.py               // evalcallback
  │   ├──lr_schedule.py                // learning rate scheduler
  │   ├──transform.py                  // handle dataset
  │   ├──loss.py                       // loss
  │   ├──utils.py                      // General components (callback function)
  │   ├──c3d_model.py                  // Unet3D model
          ├── utils
          │   ├──config.py             // parameter configuration
          │   ├──resized_mean.py     // device adapter
          |   |--dataset_preprocess.py
          │   ...
          ├── tools
          │   ├──ckpt_convert.py       // convert pytorch ckpt file to pickle file
          │   ├── // preprocess dataset
  ├── requirements.txt                 // requirements configuration
  ├── export.py                        // convert mindspore ckpt file to MINDIR file
  ├── train.py                         // evaluation script
  ├── infer.py                         // training script

Script Parameters

Parameters for both training and evaluation can be set in c3d.yaml

Parameters for both training and evaluation can be set in c3d_gpu.yaml

config for C3D, UCF101 dataset

# ==============================================================================
# model architecture
model_name: "c3d"   # model name 

#global config
device_target: "GPU"
dataset_sink_mode: False
context:            # runtime
    mode: 0         #0--Graph Mode; 1--Pynative Mode
    device_target: "GPU"
    save_graphs: False
    device_id: 3

# model settings of every parts
model:
    type: C3D
    in_d: 16
    in_h: 112
    in_w: 112
    in_channel: 3
    kernel_size: [3, 3, 3]
    head_channel: [4096, 4096]
    num_classes: 101
    keep_prob: [0.5, 0.5, 1.0]

# learning rate for training process
learning_rate:
    lr_scheduler: "exponential"
    lr: 0.003
    lr_epochs: [15, 30, 75]
    steps_per_epoch: 596
    warmup_epochs: 1
    max_epoch: 150
    lr_gamma: 0.1

# optimizer for training process
optimizer:
    type: 'SGD'
    momentum: 0.9
    weight_decay: 0.0005
    loss_scale: 1.0

# train loss
loss:       
    type: SoftmaxCrossEntropyWithLogits
    sparse: True
    reduction: "mean"

# trainning setups, including pretrain model
train:       
    pre_trained: False
    pretrained_model: ""
    ckpt_path: "./output/"
    epochs: 150
    save_checkpoint_epochs: 5
    save_checkpoint_steps: 1875
    keep_checkpoint_max: 30
    run_distribute: False

# evaluation setups
eval:
    pretrained_model: ".vscode/ms_ckpts/c3d_20220912.ckpt"

# infer setups
infer:
    pretrained_model: ".vscode/ms_ckpts/c3d_20220912.ckpt"
    batch_size: 1
    image_path: ""
    normalize: True
    output_dir: "./infer_output"

# export model into ckpt in other format  
export:       
    pretrained_model: ""
    batch_size: 64
    image_height: 112
    image_width: 112
    input_channel: 3
    file_name: "c3d"
    file_formate: "MINDIR"

# dataloader and data augmentation setups
data_loader:
    train:
        dataset:
            type: UCF101
            path: "/home/publicfile/UCF101_splits"  # Path to data root dir
            split: "train"
            batch_size: 16
            seq: 16
            seq_mode: "average"
            num_parallel_workers: 6
            shuffle: True
        map:        # data augmentation
            operations:
                - type: VideoResize
                  size: [128, 171]
                - type: VideoRescale
                  shift: "src/example/c3d/resized_mean_sports1m.npy" # mean file
                - type: VideoRandomCrop
                  size: [112, 112]
                - type: VideoRandomHorizontalFlip
                  prob: 0.5
                - type: VideoReOrder
                  order: [3, 0, 1, 2]
            input_columns: ["video"]

    eval:
        dataset:
            type: UCF101
            path: "/home/publicfile/UCF101_splits"  # Path to data root dir
            split: "test"
            batch_size: 16
            seq: 16
            seq_mode: "average"
            num_parallel_workers: 1
            shuffle: False
        map:
            operations:
                - type: VideoResize
                  size: [128, 171]
                - type: VideoRescale
                  shift: "src/example/c3d/resized_mean_sports1m.npy"  # mean file
                - type: VideoCenterCrop
                  size: [112, 112]
                - type: VideoReOrder
                  order: [3, 0, 1, 2]
            input_columns: ["video"]
    group_size: 1
# ==============================================================================

Training Process

Training

Training on Ascend

# enter scripts directory
cd scripts
# training
bash run_standalone_train_ascend.sh

The python command above will run in the background, you can view the results through the file eval.log.

After training, you’ll get some checkpoint files under the script folder by default. The loss value will be achieved as follows:

train.log for HMDB51

epoch: 1 step: 223, loss is 2.8705792
epoch time: 74139.530 ms, per step time: 332.464 ms
epoch: 2 step: 223, loss is 1.8403366
epoch time: 60084.907 ms, per step time: 269.439 ms
epoch: 3 step: 223, loss is 1.4866445
epoch time: 61095.684 ms, per step time: 273.972 ms
...
epoch: 29 step: 223, loss is 0.3037338
epoch time: 60436.915 ms, per step time: 271.018 ms
epoch: 30 step: 223, loss is 0.2176594
epoch time: 60130.695 ms, per step time: 269.644 ms

train.log for UCF101

epoch: 1 step: 596, loss is 0.53118783
epoch time: 170693.634 ms, per step time: 286.399 ms
epoch: 2 step: 596, loss is 0.51934457
epoch time: 150388.783 ms, per step time: 252.330 ms
epoch: 3 step: 596, loss is 0.07241724
epoch time: 151548.857 ms, per step time: 254.277 ms
...
epoch: 29 step: 596, loss is 0.034661677
epoch time: 150932.542 ms, per step time: 253.243 ms
epoch: 30 step: 596, loss is 0.0048465515
epoch time: 150760.797 ms, per step time: 252.954 ms

Training on GPU

Notes:If you occur a problem with the information:
“Bad performance attention, it takes more than 25 seconds to fetch and send a batch of data into device, which might result GetNext timeout problem.“
Please change the Parameter “dataset_sink_mode” to False

# enter scripts directory
cd scripts
# training
bash run_standalone_train_gpu.sh [CONFIG_PATH] [DEVICE_ID]

The above shell script will run distribute training in the background. You can view the results through the file ./train[X].log. The loss value will be achieved as follows:

train.log for UCF101

epoch: 1 step: 1192, loss is 0.8381556
epoch time: 593197.024 ms, per step time: 301.297 ms
epoch: 2 step: 1192, loss is 0.5701107
epoch time: 576058.976 ms, per step time: 260.542 ms
epoch: 3 step: 1192, loss is 0.1724325
epoch time: 578041.281 ms, per step time: 235.868 ms
...
epoch: 99 step: 1192, loss is 6.3519354e-05
epoch time: 573493.252 ms, per step time: 225.237 ms
epoch: 100 step: 1192, loss is 4.852382e-05
epoch time: 575237.743 ms, per step time: 229.164 ms

Distributed Training

Distributed training on Ascend

Notes:
RANK_TABLE_FILE can refer to Link , and the device_ip can be got as Link. For large models like InceptionV4, it’s better to export an external environment variable export HCCL_CONNECT_TIMEOUT=600 to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size.

# enter scripts directory
cd scripts
# distributed training
bash run_distribute_train_ascend.sh [RANK_TABLE_FILE]

The above shell script will run distribute training in the background. You can view the results through the file ./train[X].log. The loss value will be achieved as follows:

train0.log for UCF101

epoch: 1 step: 596, loss is 0.51830626
epoch time: 82401.300 ms, per step time: 138.257 ms
epoch: 2 step: 596, loss is 0.5527372
epoch time: 30820.129 ms, per step time: 51.712 ms
epoch: 3 step: 596, loss is 0.007791209
epoch time: 30809.803 ms, per step time: 51.694 ms
...
epoch: 29 step: 596, loss is 7.510604e-05
epoch time: 30809.334 ms, per step time: 51.694 ms
epoch: 30 step: 596, loss is 0.13138217
epoch time: 30819.966 ms, per step time: 51.711 ms

Distributed training on GPU

# enter scripts directory
cd scripts
# distributed training
vim run_distribute_train_gpu.sh to set start_device_id
bash run_distribute_train_gpu.sh [CONFIG_PATH]

train_distributed.log for UCF101

epoch: 1 step: 149, loss is 0.97137051820755
epoch: 1 step: 149, loss is 1.1462825536727905
epoch: 1 step: 149, loss is 1.484191656112671
epoch: 1 step: 149, loss is 0.639738142490387
epoch: 1 step: 149, loss is 1.1133722066879272
epoch: 1 step: 149, loss is 1.5043989419937134
epoch: 1 step: 149, loss is 1.2063453197479248
epoch: 1 step: 149, loss is 1.3174564838409424
epoch time: 183002.444 ms, per step time: 1228.204 ms
epoch time: 183388.214 ms, per step time: 1230.793 ms
epoch time: 183560.571 ms, per step time: 1231.950 ms
epoch time: 183881.357 ms, per step time: 1234.103 ms
epoch time: 184225.004 ms, per step time: 1236.409 ms
epoch time: 184383.710 ms, per step time: 1237.475 ms
epoch time: 184501.011 ms, per step time: 1238.262 ms
epoch time: 184885.520 ms, per step time: 1240.842 ms
epoch: 2 step: 149, loss is 0.10039880871772766
epoch: 2 step: 149, loss is 0.5981963276863098
epoch: 2 step: 149, loss is 0.4604840576648712
epoch: 2 step: 149, loss is 0.215419739484787
epoch: 2 step: 149, loss is 0.2556331753730774
epoch: 2 step: 149, loss is 0.03653889149427414
epoch: 2 step: 149, loss is 1.4467300176620483
epoch: 2 step: 149, loss is 1.0422033071517944
epoch time: 53143.686 ms, per step time: 356.669 ms
epoch time: 52175.739 ms, per step time: 350.173 ms
epoch time: 54300.036 ms, per step time: 364.430 ms
epoch time: 53026.808 ms, per step time: 355.885 ms
epoch time: 52941.203 ms, per step time: 355.310 ms
epoch time: 53144.090 ms, per step time: 356.672 ms
epoch time: 53896.009 ms, per step time: 361.718 ms
epoch time: 53584.895 ms, per step time: 359.630 ms
...

Evaluation Process

Evaluation

Evaluating on Ascend

evaluation on dataset when running on Ascend

Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., “username/ckpt_0/c3d-hmdb51-0-30_223.ckpt”.

# enter scripts directory
cd scripts
# eval
bash run_standalone_eval_ascend.sh [CKPT_FILE_PATH]

The above python command will run in the background. You can view the results through the file “eval.log”. The accuracy of the test dataset will be as follows:

eval.log for UCF101

start create network
pre_trained model: username/ckpt_0/c3d-ucf101-0-30_596.ckpt
setep: 1/237, acc: 0.625
setep: 21/237, acc: 1.0
setep: 41/237, acc: 0.5625
setep: 61/237, acc: 1.0
setep: 81/237, acc: 0.6875
setep: 101/237, acc: 1.0
setep: 121/237, acc: 0.5625
setep: 141/237, acc: 0.5
setep: 161/237, acc: 1.0
setep: 181/237, acc: 1.0
setep: 201/237, acc: 0.75
setep: 221/237, acc: 1.0
eval result: top_1 79.381%

Evaluating on GPU

evaluation on dataset when running on GPU

Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., “./results/xxxx-xx-xx_time_xx_xx_xx/ckpt_0/0-30_223.ckpt”.

# enter scripts directory
cd scripts
# eval
bash run_standalone_eval_gpu.sh [CKPT_FILE_PATH] [CONFIG_PATH]

eval.log for UCF101

start create network
pre_trained model: ./results/2021-11-02_time_07_30_42/ckpt_0/0-85_223.ckpt
setep: 1/237, acc: 0.75
setep: 21/237, acc: 1.0
setep: 41/237, acc: 0.625
setep: 61/237, acc: 1.0
setep: 81/237, acc: 0.875
setep: 101/237, acc: 1.0
setep: 121/237, acc: 0.9375
setep: 141/237, acc: 0.5625
setep: 161/237, acc: 1.0
setep: 181/237, acc: 1.0
setep: 201/237, acc: 0.5625
setep: 221/237, acc: 1.0
eval result: top_1 80.412%

Inference Process

Export MindIR

python export.py --ckpt_file [CKPT_PATH] --mindir_file_name [FILE_NAME] --file_format [FILE_FORMAT] --num_classes [NUM_CLASSES] --batch_size [BATCH_SIZE]

ckpt_file parameter is mandotory.
file_format should be in [“AIR”, “MINDIR”].
NUM_CLASSES Number of total classes in the dataset, 51 for HMDB51 and 101 for UCF101.
BATCH_SIZE Since currently mindir does not support dynamic shapes, this network only supports inference with batch_size of 1.

Infer on Ascend310

Before performing inference, the mindir file must be exported by export.py script. We only provide an example of inference using MINDIR model.

# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATASET] [NEED_PREPROCESS] [DEVICE_ID]

DATASET must be ‘HMDB51’ or ‘UCF101’.
NEED_PREPROCESS means weather need preprocess or not, it’s value is ‘y’ or ‘n’.
DEVICE_ID is optional, default value is 0.

result

Inference result is saved in current path, you can find result like this in acc.log file.

visulization

请添加图片描述

Model Description

Performance

Evaluation Performance

C3D for UCF101

Parameters	Ascend	GPU
Model Version	C3D	C3D
Resource	Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8	V100
uploaded Date	09/22/2021 (month/day/year)	11/06/2021 (month/day/year)
MindSpore Version	1.2.0	1.5.0
Dataset	UCF101	UCF101
Training Parameters	epoch = 30, batch_size = 16	epoch = 150, batch_size = 8
Optimizer	SGD	SGD
Loss Function	Max_SoftmaxCrossEntropyWithLogits	Max_SoftmaxCrossEntropyWithLogits
Speed	1pc: 253.372ms/step	1pc:237.128ms/step
Top_1	1pc: 80.33%	1pc:80.138%
Total time	1pc: 1.31hours	1pc:4hours
Parameters (M)	78	78


Method	Accuracy (%)
Imagenet + linear SVM	68.8
iDT w/ BoW + linear SVM	76.2
Deep networks	65.4
Spatial stream network	72.6
LRCN	71.1
LSTM composite model	75.8
C3D (1 net) + linear SVM	82.3
C3D (3 nets) + linear SVM	85.2
iDT w/ Fisher vector	87.9
Temporal stream network	83.7
Two-stream networks	88.0
LRCN	82.9
LSTM composite model	84.3
Conv. pooling on long clips 、	88.2
LSTM on long clips	88.6
Multi-skip feature stacking	89.1
C3D (3 nets) + iDT + linear SVM	90.4

Original paper of C3D:Action recognition results on UCF101. C3D compared with baselines and current state-of-the-art methods. specially，they use C3D+SVM to get higher result than C3D only.