VideoMAE:
[NeurIPS 2022] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Paper: https://arxiv.org/abs/2203.12602v3
Code: https://github.com/MCG-NJU/VideoMAE
paperwithcode: https://paperswithcode.com/paper/videomae-masked-autoencoders-are-data-1
环境准备
按照源代码的 INSTALL.md
# VideoMAE Installation
The codebase is mainly built with following libraries:
- Python 3.6 or higher
- [PyTorch](https://pytorch.org/) and [torchvision](https://github.com/pytorch/vision). <br>
We can successfully reproduce the main results under two settings below:<br>
Tesla **A100** (40G): CUDA 11.1 + PyTorch 1.8.0 + torchvision 0.9.0<br>
Tesla **V100** (32G): CUDA 10.1 + PyTorch 1.6.0 + torchvision 0.7.0
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
- [timm==0.4.8/0.4.12](https://github.com/rwightman/pytorch-image-models)
- [deepspeed==0.5.8](https://github.com/microsoft/DeepSpeed)
`DS_BUILD_OPS=1 pip install deepspeed`
- [TensorboardX](https://github.com/lanpa/tensorboardX)
- [decord](https://github.com/dmlc/decord)
- [einops](https://github.com/arogozhnikov/einops)
### Note:
1. We recommend you to use **`PyTorch >= 1.8.0`**.
2. We observed accidental interrupt in the last epoch when conducted the pre-training experiments on V100 GPUs (PyTorch 1.6.0). This interrupt is caused by the scheduler of learning rate. We naively set `--epochs 801` to walk away from issue :)
- 创建环境
建议选择python版本3.6~3.8,如果选择高版本的python,在安装torch时会报错
conda create -n videomae python==3.8
conda activate videomae
- 安装torch
为了保证能正常多卡训练,选择安装与原文一致的版本,torch1.8.0
在pytorch(previous-versions)/找到安装命令:
# CUDA 11.1
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
# CUDA 10.2
pip install torch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0
- 安装timm
pip install timm==0.4.12
- 安装deepspeed
我使用DS_BUILD_OPS=1 pip install deepspeed
会报错,于是安装完整deepspeed,实测可用
pip install deepspeed
- 安装其他库
pip install TensorboardX decord einops
- 另需安装opencv
pip install opencv-python
按照需要修改scripts中的pretrain.sh:
# Set the path to save checkpoints
OUTPUT_DIR='/data/wyyy/k400_videomae_pretrain_base_patch16_224_frame_16x4_tube_mask_ratio_0.9_e800'
# Set the path to Kinetics train set.
DATA_PATH='./train.csv'
# batch_size can be adjusted according to number of GPUs
# this script is for 64 GPUs (8 nodes x 8 GPUs)
OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 \
--master_port 12320 --nnodes=8 \
--node_rank=$1 --master_addr=$2 \
run_mae_pretraining.py \
--data_path ${DATA_PATH} \
--mask_type tube \
--mask_ratio 0.9 \
--model pretrain_videomae_base_patch16_224 \
--decoder_depth 4 \
--batch_size 32 \
--num_frames 16 \
--sampling_rate 4 \
--opt adamw \
--opt_betas 0.9 0.95 \
--warmup_epochs 40 \
--save_ckpt_freq 20 \
--epochs 801 \
--log_dir ${OUTPUT_DIR} \
--output_dir ${OUTPUT_DIR}
nnodes:机器数量
nproc_per_node:每台机器GPU数量