动作识别0-08：mmaction2(SlowFast)-源码无死角解析（4）-数据加载，预处理-1（重点篇）

本文链接：https://blog.csdn.net/weixin_43013761/article/details/107790479

以下链接是个人关于mmaction2(SlowFast-动作识别) 所有见解，如有错误欢迎大家指出，我会第一时间纠正。有兴趣的朋友可以加微信：17575010159 相互讨论技术。若是帮助到了你什么，一定要记得点赞！因为这是对我最大的鼓励。 $\color{blue}{文末附带}$ $\color{blue}{公众号 -}$ $\color{blue}{ 海量资源}。$

动作识别0-00：mmaction2(SlowFast)-目录-史上最新无死角讲解

$\color{red}{极度推荐的商业级项目：}$ 这是本人落地的行为分析项目，主要包含（1.行人检测，2.行人追踪，3.行为识别三大模块）：行为分析(商用级别)00-目录-史上最新无死角讲解

前言

数据的加载有两种方式分别为RawframeDataset，VideoDataset。如本人训练的my_slowfast_r50_4x16x1_256e_ucf101_rgb.py中的如下代码：

dataset_type = 'RawframeDataset'

其就指定了加载数据的方式。
1.RawframeDataset：把视频先切割成每帧图片，然后加载训练。
2.VideoDataset：不需要进行视频切割，直接加载进行训练（本人喜欢这种方式）。

在tools/train.py文件中，我们可以看到如下代码：

    # 创建数据迭代器
    datasets = [build_dataset(cfg.data.train)]

其就会根据 cfg 中的 dataset_type 参数构建相应的数据迭代器，本人简单注释build_dataset函数如下：

def build_dataset(cfg, default_args=None):
    """Build a dataset from config dict.

    Args:
        cfg (dict): Config dict. It should at least contain the key "type".
        default_args (dict, optional): Default initialization arguments.
            Default: None.

    Returns:
        Dataset: The constructed dataset.
    """
    # 如果type为重复采样（本人调试为RawframeDataset）
    if cfg['type'] == 'RepeatDataset':
        # 构建重复采样器
        dataset = RepeatDataset(
            build_dataset(cfg['dataset'], default_args), cfg['times'])
    else:
        # DATASETS = Registry('dataset'),可以理解为一个用于装载dataset相关类的容器
        # 根据cdif的type=RawframeDataset,从DATASETS中获得RawframeDataset类，并创建对象
        dataset = build_from_cfg(cfg, DATASETS, default_args)
    return dataset

其最终会调用到
mmaction/datasets/rawframe_dataset.py中的 class RawframeDataset( $\color{red}{BaseDataset}$ ):
或者
mmaction/datasets/video_dataset.py中的 class VideoDataset( $\color{red}{BaseDataset}$ ):

从这里我们可以看到其两个类都会继承于BaseDataset，其实现于项目根目录下的mmaction\datasets\base.py中。注释代码如下：

BaseDataset

import copy
import os.path as osp
from abc import ABCMeta, abstractmethod

import mmcv
from torch.utils.data import Dataset

from .pipelines import Compose


class BaseDataset(Dataset, metaclass=ABCMeta):
    """Base class for datasets.

    All datasets to process video should subclass it.
    All subclasses should overwrite:

    - Methods:`load_annotations`, supporting to load information from an
    annotation file.

    - Methods:`prepare_train_frames`, providing train data.

    - Methods:`prepare_test_frames`, providing test data.

    Args:
        ann_file (str): Path to the annotation file.
        pipeline (list[dict | callable]): A sequence of data transforms.
        data_prefix (str): Path to a directory where videos are held.
            Default: None.
        test_mode (bool): Store True when building test or validation dataset.
            Default: False.
        multi_class (bool): Determines whether the dataset is a multi-class
            dataset. Default: False.
        num_classes (int): Number of classes of the dataset, used in
            multi-class datasets. Default: None.
        modality (str): Modality of data. Support 'RGB', 'Flow'.
            Default: 'RGB'.
    """

    def __init__(self,
                 ann_file,  # 注释文件的路径
                 pipeline,  # 数据转换序列（后续重点讲解）
                 data_prefix=None,  # 存放视频的目录
                 test_mode=False,  # 在构建测试或验证数据集时需要设置为True
                 multi_class=False,  # 是否进行多标签的训练或者测试
                 num_classes=None,  # 数据集的类别数目
                 modality='RGB'):  # 数据的格式，默认为RGB
        super().__init__()

        # 注释文件的路径
        self.ann_file = ann_file

        # 存放视频的目录
        self.data_prefix = osp.realpath(data_prefix) if osp.isdir(
            data_prefix) else data_prefix

        # 在构建测试或验证数据集时需要设置为True
        self.test_mode = test_mode
        # 是否进行多标签的训练或者测试
        self.multi_class = multi_class
        # 数据集的类别数目
        self.num_classes = num_classes
        # 数据的格式，默认为RGB
        self.modality = modality
        # 数据转换序列
        self.pipeline = Compose(pipeline)
        # 加载视频信息，该self.load_annotations()函数具体需要子类实现
        self.video_infos = self.load_annotations()

    @abstractmethod
    def load_annotations(self):
        """Load the annotation according to ann_file into video_infos."""
        pass

    @abstractmethod #模型评估,需要子类实现
    def evaluate(self, results, metrics, logger):
        """Evaluation for the dataset.

        Args:
            results (list): Output results.
            metrics (str | sequence[str]): Metrics to be performed.
            logger (logging.Logger | None): Logger for recording.

        Returns:
            dict: Evaluation results dict.
        """
        pass
    # 导出结果，导出为json/yaml/pickle等
    def dump_results(self, results, out):
        """Dump data to json/yaml/pickle strings or files."""
        return mmcv.dump(results, out)


    def prepare_train_frames(self, idx):
        """Prepare the frames for training given the index.
        根据输入的idx号，进行数据转换。
        """
        results = copy.deepcopy(self.video_infos[idx])
        results['modality'] = self.modality
        return self.pipeline(results)


    def prepare_test_frames(self, idx):
        """Prepare the frames for testing given the index.
        根据输入的idx号，进行数据转换。
        """
        results = copy.deepcopy(self.video_infos[idx])
        results['modality'] = self.modality
        return self.pipeline(results)

    def __len__(self):
        """Get the size of the dataset获得数据的长度信息"""
        return len(self.video_infos)

    def __getitem__(self, idx):
        """Get the sample for either training or testing given index.
        根据训练或者测试模式，进行不同的数据转换
        """
        if self.test_mode:
            return self.prepare_test_frames(idx)
        else:
            return self.prepare_train_frames(idx)

BaseDataset仅仅一个基类，RawframeDataset，VideoDataset（下篇博客对他们进行分析）都继承于他。其上的
def prepare_train_frames(self, idx) 与 def prepare_train_frames(self, idx) 会被继承的类重写。但是他们都调用了一个至关重要的函数 self.pipeline(results)。

pipeline

我们现在来看看 pipeline 到底是何方神圣。本人启动 debug 模式显示如下：

# 训练模式（train）
<class 'list'>: [{'type': 'SampleFrames', 'clip_len': 16, 'frame_interval': 2, 'num_clips': 1}, {'type': 'FrameSelector'}, {'type': 'Resize', 'scale': (-1, 256)}, {'type': 'RandomResizedCrop'}, {'type': 'Resize', 'scale': (224, 224), 'keep_ratio': False}, {'type': 'Flip', 'flip_ratio': 0.5}, {'type': 'Normalize', 'mean': [123.675, 116.28, 103.53], 'std': [58.395, 57.12, 57.375], 'to_bgr': False}, {'type': 'FormatShape', 'input_format': 'NCTHW'}, {'type': 'Collect', 'keys': ['imgs', 'label'], 'meta_keys': []}, {'type': 'ToTensor', 'keys': ['imgs', 'label']}]
	# clip_len表示采样的图片帧数， frame_interval表示每次采集间隔几帧， num_clips表示采集几个clip
	00 = {ConfigDict} {'type': 'SampleFrames', 'clip_len': 16, 'frame_interval': 2, 'num_clips': 1}
	# 视频帧选择器
	01 = {ConfigDict} {'type': 'FrameSelector'}
	# 对图片进行缩放
	02 = {ConfigDict} {'type': 'Resize', 'scale': (-1, 256)}
	# 进行随机剪切
	03 = {ConfigDict} {'type': 'RandomResizedCrop'}
	# 图片缩放到指定尺寸，keep_ratio控制图片长宽比例是否改变
	04 = {ConfigDict} {'type': 'Resize', 'scale': (224, 224), 'keep_ratio': False}
	# 以一定概率水平反转
	05 = {ConfigDict} {'type': 'Flip', 'flip_ratio': 0.5}
	# 正则化处理
	06 = {ConfigDict} {'type': 'Normalize', 'mean': [123.675, 116.28, 103.53], 'std': [58.395, 57.12, 57.375], 'to_bgr': False}
	# 把训练数据的变成'NCTHW'形状，方便进行训练
	07 = {ConfigDict} {'type': 'FormatShape', 'input_format': 'NCTHW'}
	# 把训练数据以及对应的标签整合起来
	08 = {ConfigDict} {'type': 'Collect', 'keys': ['imgs', 'label'], 'meta_keys': []}
	# 转化为pytorch的ToTensor格式
	09 = {ConfigDict} {'type': 'ToTensor', 'keys': ['imgs', 'label']}



# 测试模式（test）
pipeline = {list} <class 'list'>: [{'type': 'SampleFrames', 'clip_len': 32, 'frame_interval': 2, 'num_clips': 10, 'test_mode': True}, {'type': 'FrameSelector'}, {'type': 'Resize', 'scale': (-1, 256)}, {'type': 'ThreeCrop', 'crop_size': 256}, {'type': 'Flip', 'flip_ratio': 0
	 # clip_len表示采样的图片帧数， frame_interval表示每次采集间隔几帧， num_clips表示采集几个clip
	 00 = {ConfigDict} {'type': 'SampleFrames', 'clip_len': 32, 'frame_interval': 2, 'num_clips': 10, 'test_mode': True}
	 # 视频帧选择器
	 01 = {ConfigDict} {'type': 'FrameSelector'}
	 # 对图片进行缩放
	 02 = {ConfigDict} {'type': 'Resize', 'scale': (-1, 256)}
	 # 对采集的每帧图片剪裁3吃
	 03 = {ConfigDict} {'type': 'ThreeCrop', 'crop_size': 256}
	 # 设置为0不进行左右反转
	 04 = {ConfigDict} {'type': 'Flip', 'flip_ratio': 0}
	 # 进行正则化处理
	 05 = {ConfigDict} {'type': 'Normalize', 'mean': [123.675, 116.28, 103.53], 'std': [58.395, 57.12, 57.375], 'to_bgr': False}
	 # 把训练数据的变成'NCTHW'形状，方便进行训练
	 06 = {ConfigDict} {'type': 'FormatShape', 'input_format': 'NCTHW'}
	 # 把训练数据以及对应的标签整合起来
	 07 = {ConfigDict} {'type': 'Collect', 'keys': ['imgs', 'label'], 'meta_keys': []}
	 # 转化为pytorch的ToTensor格式
	 08 = {ConfigDict} {'type': 'ToTensor', 'keys': ['imgs']}

从上面可以看到，几乎数据的预处理过程都包含在pipeline之中，所以是非常重要的一部分，这些过程都是在my_slowfast_r50_4x16x1_256e_ucf101_rgb.py配置文件中可以进行配置的。其函数的实现几乎都在项目根目录的mmaction/datasets/pipelines目录中。