1、准备视频数据集
mmaction2工程根目录下新建data文件夹,将自己的动作识别数据集放入其中,存放格式如下:
/mmaction
--configs
--data
--myucf101
--videos(用于存放不同类别的动作视频)
--ApplyEyeMakeup
v_ApplyEyeMakeup_g01_c01.mp4
v_ApplyEyeMakeup_g01_c02.mp4
...
v_ApplyEyeMakeup_g25_c07.mp4
--ApplyLipstick
v_ApplyLipstick_g01_c01.mp4
v_ApplyLipstick_g01_c02.mp4
...
v_ApplyLipstick_g025_c04.mp4
...
--YoYo
v_YoYo_g01_c01.mp4
v_YoYo_g01_c02.mp4
...
v_YoYo_g025_c05.mp4
--txt(存放视频标签文件)
classInd.txt
trainlist.txt
testlist.txt
--rawframes(存放切分后的视频帧)
--demo
--docker
...
其中,txt目录下的classInd.txt文件存放类别信息,序号从1开始,内容如下:
1 ApplyEyeMakeup
2 ApplyLipstick
...
101 YoYo
trainlist.txt和testlist.txt文件存放视频路径,内容如下:
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c03.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c04.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c05.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c06.avi
ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c01.avi
...
...
YoYo/v_YoYo_g07_c04.avi
2、处理为视频帧数据集
2.1 视频切帧
修改mmaction2/tools/data/ucf101/extract_rgb_frames_opencv.sh脚本文件中的内容,改为自己的数据集目录:
# 修改内容为:
python build_rawframes.py ../../data/myucf101/videos/ ../../data/myucf101/rawframes/ --task rgb --level 2 --ext avi --use-opencv
运行该shell脚本,进行切帧:
bash extract_rgb_frames_opencv.sh
2.2 生成list文件
修改mmaction2/tools/data/ucf101/generate_rawframes_filelist.sh脚本文件中的数据集目录:
# 修改内容为:
PYTHONPATH=. python tools/data/build_file_list.py myucf101 data/myucf101/rawframes/ --level 2 --format rawframes --shuffle
2.2.1 修改mmaction2/tools/data/build_file_list.py文件,修改其中的3处内容
1)加入用于处理自己数据集的方法名
from tools.data.parse_file_list import (parse_directory, parse_diving48_splits,
parse_hmdb51_split,
parse_jester_splits,
parse_kinetics_splits,
parse_mit_splits, parse_mmit_splits,
parse_sthv1_splits, parse_sthv2_splits,
parse_ucf101_splits,
parse_myucf101_splits # 用于处理自己数据集的方法
)
2)添加自己的数据集名称
parser.add_argument(
'dataset',
type=str,
choices=[
'ucf101', 'kinetics400', 'kinetics600', 'kinetics700', 'thumos14',
'sthv1', 'sthv2', 'mit', 'mmit', 'activitynet', 'hmdb51', 'jester',
'diving48', 'myucf101'
],
help='dataset to be built file list')
3)加入调用处理新数据集的代码
if args.dataset == 'ucf101':
splits = parse_myucf101_splits(args.level)
elif args.dataset == 'myucf101':
splits = parse_myucf101_splits(args.level)
2.2.2 修改mmaction2/tools/data/parse_file_list.py文件,加入处理新数据集的方法
在parse_file_list.py文件中加入parse_myucf101_splits方法,类似其中的parse_ucf101_splits方法,只需修改对应的文件路径:
def parse_myucf101_splits(level):
class_index_file = 'data/myucf101/txt/classInd.txt'
train_file_template = 'data/myucf101/txt/trainlist.txt'
test_file_template = 'data/myucf101/txt/testlist.txt'
with open(class_index_file, 'r') as fin:
class_index = [x.strip().split() for x in fin]
class_mapping = {x[1]: int(x[0]) - 1 for x in class_index}
def line_to_map(line):
items = line.strip().split()
video = osp.splitext(items[0])[0]
if level == 1:
video = osp.basename(video)
label = items[0]
elif level == 2:
video = osp.join(
osp.basename(osp.dirname(video)), osp.basename(video))
label = class_mapping[osp.dirname(items[0])]
return video, label
splits = []
for i in range(1, 4):
with open(train_file_template.format(i), 'r') as fin:
train_list = [line_to_map(x) for x in fin]
with open(test_file_template.format(i), 'r') as fin:
test_list = [line_to_map(x) for x in fin]
splits.append((train_list, test_list))
return splits
然后,执行mmaction2/tools/data/ucf101/generate_rawframes_filelist.sh脚本文件,以能够在mmaction2/data/myucf101目录下生成相应的rawframe.txt文件。
3、模型训练
以TSM训练为例,首先拷贝mmaction2/configs/recognition/tsm/tsm_imagenet-pretrained-r50_8xb16-1x1x8-50e_kinetics400-rgb.py文件,将其重名为my_tsm_imagenet-pretrained-r50_8xb16-1x1x8-50e_kinetics400-rgb.py,然后对其修改如下:
# dataset settings
dataset_type = 'RawframeDataset'
data_root = 'data/myucf101/rawframes/'
data_root_val = 'data/myucf101/rawframes/'
ann_file_train = 'data/myucf101/myucf101_train_split_1_rawframes.txt'
ann_file_val = 'data/myucf101/myucf101_val_split_1_rawframes.txt'
ann_file_test = 'data/myucf101/myucf101_val_split_1_rawframes.txt'
file_client_args = dict(io_backend='disk')
train_pipeline = [
# dict(type='DecordInit', **file_client_args),
dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(
type='MultiScaleCrop',
input_size=224,
scales=(1, 0.875, 0.75, 0.66),
random_crop=False,
max_wh_scale_gap=1,
num_fixed_crops=13),
dict(type='Resize', scale=(224, 224), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5),
dict(type='FormatShape', input_format='NCHW'),
dict(type='PackActionInputs')
]
val_pipeline = [
# dict(type='DecordInit', **file_client_args),
dict(
type='SampleFrames',
clip_len=1,
frame_interval=1,
num_clips=8,
test_mode=True),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='CenterCrop', crop_size=224),
dict(type='FormatShape', input_format='NCHW'),
dict(type='PackActionInputs')
]
test_pipeline = [
# dict(type='DecordInit', **file_client_args),
dict(
type='SampleFrames',
clip_len=1,
frame_interval=1,
num_clips=8,
test_mode=True),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='TenCrop', crop_size=224),
dict(type='FormatShape', input_format='NCHW'),
dict(type='PackActionInputs')
]
train_dataloader = dict(
batch_size=4,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix=dict(img=data_root),
pipeline=train_pipeline))
val_dataloader = dict(
batch_size=4,
num_workers=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix=dict(img=data_root_val),
pipeline=val_pipeline,
test_mode=True))
test_dataloader = dict(
batch_size=1,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix=dict(img=data_root_val),
pipeline=test_pipeline,
test_mode=True))
val_evaluator = dict(type='AccMetric')
test_evaluator = val_evaluator
default_hooks = dict(checkpoint=dict(interval=3, max_keep_ckpts=3))
train_cfg = dict(
type='EpochBasedTrainLoop', max_epochs=50, val_begin=1, val_interval=1)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
param_scheduler = [
dict(type='LinearLR', start_factor=0.1, by_epoch=True, begin=0, end=5),
dict(
type='MultiStepLR',
begin=0,
end=50,
by_epoch=True,
milestones=[25, 45],
gamma=0.1)
]
optim_wrapper = dict(
constructor='TSMOptimWrapperConstructor',
paramwise_cfg=dict(fc_lr5=True),
optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001),
clip_grad=dict(max_norm=20, norm_type=2))
load_from = 'weights/tsm_imagenet-pretrained-r50_8xb16-1x1x8-50e_kinetics400-rgb_20220831-64d69186.pth'
auto_scale_lr = dict(enable=False, base_batch_size=128)
其中,加入load_from用于使用预训练权重;修改为正确的数据集路径;改变了train_pipeline、val_pipeline、test_pipeline、train_dataloader、val_dataloader和test_dataloader,使能够用于视频帧训练。
其次,修改mmaction2/configs/ ‾ \underline{~} base ‾ \underline{~} /models/tsm_r50.py中的num_class为正确数值。
最后,使用如下命令进行多卡训练:
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash tools/dist_train.sh configs/recognition/tsm/my_tsm_imagenet-pretrained-r50_8xb16-1x1x8-50e_kinetics400-rgb.py 4