Temporal Convolutional Networks for Action Segmentation and Detection论文及keras代码

最新推荐文章于 2024-04-18 13:47:58 发布

鱼吐泡泡水

最新推荐文章于 2024-04-18 13:47:58 发布

阅读量1.4k

点赞数 1

分类专栏：多元回归文章标签：深度学习神经网络

本文链接：https://blog.csdn.net/m0_37859875/article/details/110728672

版权

多元回归专栏收录该内容

5 篇文章 2 订阅

订阅专栏

Temporal Convolutional Networks for Action Segmentation and Detection

Lea C, Flynn M D, Vidal R, et al. Temporal convolutional networks for action segmentation and detection[C]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 156-165.

Motivation

行为细分（Action Segmentation）方法预测在一个视频中每一帧出现什么动作。
检测（Detection）方法输出一个稀疏的动作细分集合，这个集合中一个细分由起始时间，和类别标签定义。
传统的方法将这个问题分解为两步：
首先从视频的帧中提取局部的时空特征；再将他们喂入一个时间分类器中捕捉高级的时间模式。
其中第二步最近的时间模型主要有三种，但是他们都是有对应的缺点：

Sliding window action detectors：太短不能捕获长期的时间模式
Segmental models：捕获段内属性，但是忽略了长期的潜在依赖
Recurrent models：注意力有限且很难正确训练。

Model

Encoder-Decoder-TCN

在这里插入图片描述
编码器:
$E^{(l)}\in R^{F_l\times T_l}$
时间卷积，非线性激活，最大池化
$E^{(l)}=max\_pooling(f(W*E^{(l-1)}+b))$
解码器：
$D^{(l)}\in R^{F_l\times T_l}$
上采样，卷积，激活函数
$\hat Y_t=softmax(UD_t^{(1)}+c)$

Dilated-TCN

在这里插入图片描述
膨胀TCN由一系列block组成，每个block又由L个卷积层序列组成。
$S^{(j,l)\in R^{F_w\times T}}$ ：第j个block中第l层的激活函数。
每一层都由具有膨胀率参数的一系列膨胀卷积、一个非线性激活函数和一个残差连接组成。
膨胀卷积在时刻t的结果为：
$\hat S_t^{(j,l)}=f(W^{(1)}S_{t-s}^{(j,l-1)}+W^{(2)}S_{t}^{(j,l-1)}+b)$
再加入残差连接之后的结果为：
$S_t^{(j,l)}=S_t^{(j,l-1)}+V\hat S_t^{(j,l)}+\epsilon$
一系列的跳跃连接之后;
$Z_t^{(0)}=ReLU(\sum_{j=1}^BS_t^{(j,L)})$
每个时刻t的预测结果为：
$\hat Y_t=softmax(UZ_t^{(1)}+c)$
两个模型的共同点：
在这里插入图片描述
两个模型的区别：
ED-TCN：

Efficiently capture long-range temporal patterns;
Has a relatively small number of layers;
Each layer contains a set of long convolutional filters.

Dilated-TCN：

Was developed for speech synthesis;
Has more layers;
Each layer uses dilated filters that only operate on a small number of time steps.

因果卷积 vs 非因果卷积：
因果卷积：t时刻的预测仅仅是1-t时刻数据的函数
ED-TCN：
从 $X_{(t-d)}$ 到 $X_t$ 进行卷积
Dilated-TCN:
$\hat S_t^{(j,l)}=f(W^{(1)}S_{t-s}^{(j,l-1)}+W^{(2)}S_{t}^{(j,l-1)}+b)$
非因果卷积：t时刻的预测可能是序列中任意时间步数据的函数
ED-TCN：
从 $X_{(t-\frac{d}{2})}$ 到 $X_{(t+\frac{d}{2})}$ 进行卷积
Dilated-TCN:
$\hat S_t^{(j,l)}=f(W^{(1)}S_{t-s}^{(j,l-1)}+W^{(2)}S_{t}^{(j,l-1)}+W^{(3)}S_{t+s}^{(j,l-1)}+b)$

Experiments

采用F1同时评价Segmentation任务和Detection任务：
$P=\frac{TP}{TP+FP}, R=\frac{TP}{TP+FN}$
$F_1=2 \frac{prec*recall}{prec+recall}$
如果IoU分数在阈值之上的话被认为是TP（True Positive），否则是FP（False Positive）
数据集：
在这里插入图片描述
在每个数据集上都实现了SOTA。
其余实验部分的细节大家可以查看原文。

keras实现模型部分代码：

import numpy as np

from keras.models import Sequential, Model
from keras.layers import Input, Dense, TimeDistributed, merge, Lambda
from keras.layers.core import *
from keras.layers.convolutional import *
from keras.layers.recurrent import *

import tensorflow as tf
from keras import backend as K

from keras.activations import relu
from functools import partial

def channel_normalization(x):
    # Normalize by the highest activation最大值进行正则化
    max_values = K.max(K.abs(x), 2, keepdims=True) + 1e-5
    out = x / max_values
    return out


def WaveNet_activation(x):
    # WaveNet的激活函数
    tanh_out = Activation('tanh')(x)
    sigm_out = Activation('sigmoid')(x)
    return Merge(mode='mul')([tanh_out, sigm_out])


#  -------------------------------------------------------------
def ED_TCN(n_nodes, conv_len, n_classes, n_feat, max_len,
           loss='categorical_crossentropy', causal=False,
           optimizer="rmsprop", activation='norm_relu',
           return_param_str=False):
    n_layers = len(n_nodes)

    inputs = Input(shape=(max_len, n_feat))     # [T,F]
    model = inputs

    # ---- Encoder ----
    for i in range(n_layers):
        # Pad beginning of sequence to prevent usage of future data
        if causal: model = ZeroPadding1D((conv_len // 2, 0))(model)
        model = Convolution1D(n_nodes[i], conv_len, border_mode='same')(model)
        if causal: model = Cropping1D((0, conv_len // 2))(model)

        model = SpatialDropout1D(0.3)(model)

        if activation == 'norm_relu':
            model = Activation('relu')(model)
            model = Lambda(channel_normalization, name="encoder_norm_{}".format(i))(model)
        elif activation == 'wavenet':
            model = WaveNet_activation(model)
        else:
            model = Activation(activation)(model)

        model = MaxPooling1D(2)(model)

    # ---- Decoder ----
    for i in range(n_layers):
        model = UpSampling1D(2)(model)
        if causal: model = ZeroPadding1D((conv_len // 2, 0))(model)
        model = Convolution1D(n_nodes[-i - 1], conv_len, border_mode='same')(model)
        if causal: model = Cropping1D((0, conv_len // 2))(model)

        model = SpatialDropout1D(0.3)(model)

        if activation == 'norm_relu':
            model = Activation('relu')(model)
            model = Lambda(channel_normalization, name="decoder_norm_{}".format(i))(model)
        elif activation == 'wavenet':
            model = WaveNet_activation(model)
        else:
            model = Activation(activation)(model)

    # Output FC layer
    model = TimeDistributed(Dense(n_classes, activation="softmax"))(model)

    model = Model(input=inputs, output=model)
    model.compile(loss=loss, optimizer=optimizer, sample_weight_mode="temporal", metrics=['accuracy'])

    if return_param_str:
        param_str = "ED-TCN_C{}_L{}".format(conv_len, n_layers)
        if causal:
            param_str += "_causal"

        return model, param_str
    else:
        return model


def Dilated_TCN(num_feat, num_classes, nb_filters, dilation_depth, nb_stacks, max_len,
                activation="wavenet", tail_conv=1, use_skip_connections=True, causal=False,
                optimizer='adam', return_param_str=False):
    """
    dilation_depth : number of layers per stack
    nb_stacks : number of stacks.
    """

    def residual_block(x, s, i, activation):
        original_x = x

        if causal:
            x = ZeroPadding1D(((2 ** i) // 2, 0))(x)
            conv = AtrousConvolution1D(nb_filters, 2, atrous_rate=2 ** i, border_mode='same',
                                       name='dilated_conv_%d_tanh_s%d' % (2 ** i, s))(x)
            conv = Cropping1D((0, (2 ** i) // 2))(conv)
        else:
            conv = AtrousConvolution1D(nb_filters, 3, atrous_rate=2 ** i, border_mode='same',
                                       name='dilated_conv_%d_tanh_s%d' % (2 ** i, s))(x)

        conv = SpatialDropout1D(0.3)(conv)
        # x = WaveNet_activation(conv)

        if activation == 'norm_relu':
            x = Activation('relu')(conv)
            x = Lambda(channel_normalization)(x)
        elif activation == 'wavenet':
            x = WaveNet_activation(conv)
        else:
            x = Activation(activation)(conv)

            # res_x  = Convolution1D(nb_filters, 1, border_mode='same')(x)
        # skip_x = Convolution1D(nb_filters, 1, border_mode='same')(x)
        x = Convolution1D(nb_filters, 1, border_mode='same')(x)

        res_x = Merge(mode='sum')([original_x, x])

        # return res_x, skip_x
        return res_x, x

    input_layer = Input(shape=(max_len, num_feat))  # [T,F]

    skip_connections = []

    x = input_layer
    if causal:
        x = ZeroPadding1D((1, 0))(x)
        x = Convolution1D(nb_filters, 2, border_mode='same', name='initial_conv')(x)
        x = Cropping1D((0, 1))(x)
    else:
        x = Convolution1D(nb_filters, 3, border_mode='same', name='initial_conv')(x)

    for s in range(nb_stacks):
        for i in range(0, dilation_depth + 1):
            x, skip_out = residual_block(x, s, i, activation)
            skip_connections.append(skip_out)

    if use_skip_connections:
        x = Merge(mode='sum')(skip_connections)
    x = Activation('relu')(x)
    x = Convolution1D(nb_filters, tail_conv, border_mode='same')(x)
    x = Activation('relu')(x)
    x = Convolution1D(num_classes, tail_conv, border_mode='same')(x)
    x = Activation('softmax', name='output_softmax')(x)

    model = Model(input_layer, x)
    model.compile(optimizer, loss='categorical_crossentropy', sample_weight_mode='temporal')

    if return_param_str:
        param_str = "D-TCN_C{}_B{}_L{}".format(2, nb_stacks, dilation_depth)
        if causal:
            param_str += "_causal"

        return model, param_str
    else:
        return model

代码链接点击此处。

鱼吐泡泡水

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
Temporal Convolutional Networks for Action Segmentation and Detection论文及keras代码

Temporal Convolutional Networks for Action Segmentation and Detection论文及代码
复制链接

扫一扫