【VideoMAE V1】复现记录

V04_1898

已于 2024-02-02 15:03:34 修改

阅读量655

点赞数 7

分类专栏： coding记录文章标签： python

于 2024-02-02 11:30:23 首次发布

本文链接：https://blog.csdn.net/V04_1898/article/details/135989335

版权

coding记录专栏收录该内容

1 篇文章 0 订阅

订阅专栏

VideoMAE：

[NeurIPS 2022] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Paper: https://arxiv.org/abs/2203.12602v3

Code: https://github.com/MCG-NJU/VideoMAE

paperwithcode: https://paperswithcode.com/paper/videomae-masked-autoencoders-are-data-1

环境准备

按照源代码的 INSTALL.md

# VideoMAE Installation

The codebase is mainly built with following libraries:

- Python 3.6 or higher

- [PyTorch](https://pytorch.org/) and [torchvision](https://github.com/pytorch/vision). <br>
  We can successfully reproduce the main results under two settings below:<br>
  Tesla **A100** (40G): CUDA 11.1 + PyTorch 1.8.0 + torchvision 0.9.0<br>
  Tesla **V100** (32G): CUDA 10.1 + PyTorch 1.6.0 + torchvision 0.7.0
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
- [timm==0.4.8/0.4.12](https://github.com/rwightman/pytorch-image-models)

- [deepspeed==0.5.8](https://github.com/microsoft/DeepSpeed)

  `DS_BUILD_OPS=1 pip install deepspeed`

- [TensorboardX](https://github.com/lanpa/tensorboardX)

- [decord](https://github.com/dmlc/decord)

- [einops](https://github.com/arogozhnikov/einops)

### Note:
 1. We recommend you to use **`PyTorch >= 1.8.0`**.
 2. We observed accidental interrupt in the last epoch when conducted the pre-training experiments on V100 GPUs (PyTorch 1.6.0). This interrupt is caused by the scheduler of learning rate. We naively set  `--epochs 801` to walk away from issue :)

创建环境
建议选择python版本3.6~3.8，如果选择高版本的python，在安装torch时会报错

conda create -n videomae python==3.8
conda activate videomae

安装torch
为了保证能正常多卡训练，选择安装与原文一致的版本，torch1.8.0
在pytorch(previous-versions)/找到安装命令：

# CUDA 11.1
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

# CUDA 10.2
pip install torch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0

安装timm

pip install timm==0.4.12

安装deepspeed
我使用DS_BUILD_OPS=1 pip install deepspeed 会报错，于是安装完整deepspeed，实测可用

pip install deepspeed

安装其他库

pip install TensorboardX decord einops

另需安装opencv

pip install opencv-python

按照需要修改scripts中的pretrain.sh：

# Set the path to save checkpoints
OUTPUT_DIR='/data/wyyy/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800'
# Set the path to Kinetics train set.
DATA_PATH='./train.csv'

# batch_size can be adjusted according to number of GPUs
# this script is for 64 GPUs (8 nodes x 8 GPUs)
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 \
        --master_port 12320 --nnodes=8  \
         --node_rank=$1 --master_addr=$2 \
        run_mae_pretraining.py \
        --data_path ${DATA_PATH} \
        --mask_type tube \
        --mask_ratio 0.9 \
        --model pretrain_videomae_base_patch16_224 \
        --decoder_depth 4 \
        --batch_size 32 \
        --num_frames 16 \
        --sampling_rate 4 \
        --opt adamw \
        --opt_betas 0.9 0.95 \
        --warmup_epochs 40 \
        --save_ckpt_freq 20 \
        --epochs 801 \
        --log_dir ${OUTPUT_DIR} \
        --output_dir ${OUTPUT_DIR}