Temporal Convolutional Networks for Action Segmentation and Detection论文及keras代码

Temporal Convolutional Networks for Action Segmentation and Detection

Lea C, Flynn M D, Vidal R, et al. Temporal convolutional networks for action segmentation and detection[C]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 156-165.

Motivation

行为细分(Action Segmentation)方法预测在一个视频中每一帧出现什么动作。
检测(Detection)方法输出一个稀疏的动作细分集合,这个集合中一个细分由起始时间,和类别标签定义。
传统的方法将这个问题分解为两步:
首先从视频的帧中提取局部的时空特征;再将他们喂入一个时间分类器中捕捉高级的时间模式。
其中第二步最近的时间模型主要有三种,但是他们都是有对应的缺点:

  1. Sliding window action detectors:太短不能捕获长期的时间模式
  2. Segmental models:捕获段内属性,但是忽略了长期的潜在依赖
  3. Recurrent models:注意力有限且很难正确训练。

Model

Encoder-Decoder-TCN

在这里插入图片描述
编码器:
E ( l ) ∈ R F l × T l E^{(l)}\in R^{F_l\times T_l} E(l)RFl×Tl
时间卷积,非线性激活,最大池化
E ( l ) = m a x _ p o o l i n g ( f ( W ∗ E ( l − 1 ) + b ) ) E^{(l)}=max\_pooling(f(W*E^{(l-1)}+b)) E(l)=max_pooling(f(WE(l1)+b))
解码器:
D ( l ) ∈ R F l × T l D^{(l)}\in R^{F_l\times T_l} D(l)RFl×Tl
上采样,卷积,激活函数
Y ^ t = s o f t m a x ( U D t ( 1 ) + c ) \hat Y_t=softmax(UD_t^{(1)}+c) Y^t=softmax(UDt(1)+c)

Dilated-TCN

在这里插入图片描述
膨胀TCN由一系列block组成,每个block又由L个卷积层序列组成。
S ( j , l ) ∈ R F w × T S^{(j,l)\in R^{F_w\times T}} S(j,l)RFw×T:第j个block中第l层的激活函数。
每一层都由具有膨胀率参数的一系列膨胀卷积、一个非线性激活函数和一个残差连接组成。
膨胀卷积在时刻t的结果为:
S ^ t ( j , l ) = f ( W ( 1 ) S t − s ( j , l − 1 ) + W ( 2 ) S t ( j , l − 1 ) + b ) \hat S_t^{(j,l)}=f(W^{(1)}S_{t-s}^{(j,l-1)}+W^{(2)}S_{t}^{(j,l-1)}+b) S^t(j,l)=f(W(1)Sts(j,l1)+W(2)St(j,l1)+b)
再加入残差连接之后的结果为:
S t ( j , l ) = S t ( j , l − 1 ) + V S ^ t ( j , l ) + ϵ S_t^{(j,l)}=S_t^{(j,l-1)}+V\hat S_t^{(j,l)}+\epsilon St(j,l)=St(j,l1)+VS^t(j,l)+ϵ
一系列的跳跃连接之后;
Z t ( 0 ) = R e L U ( ∑ j = 1 B S t ( j , L ) ) Z_t^{(0)}=ReLU(\sum_{j=1}^BS_t^{(j,L)}) Zt(0)=ReLU(j=1BSt(j,L))
每个时刻t的预测结果为:
Y ^ t = s o f t m a x ( U Z t ( 1 ) + c ) \hat Y_t=softmax(UZ_t^{(1)}+c) Y^t=softmax(UZt(1)+c)
两个模型的共同点:
在这里插入图片描述
两个模型的区别:
ED-TCN:

  1. Efficiently capture long-range temporal patterns;
  2. Has a relatively small number of layers;
  3. Each layer contains a set of long convolutional filters.

Dilated-TCN:

  1. Was developed for speech synthesis;
  2. Has more layers;
  3. Each layer uses dilated filters that only operate on a small number of time steps.

因果卷积 vs 非因果卷积:
因果卷积:t时刻的预测仅仅是1-t时刻数据的函数
ED-TCN:
X ( t − d ) X_{(t-d)} X(td) X t X_t Xt进行卷积
Dilated-TCN:
S ^ t ( j , l ) = f ( W ( 1 ) S t − s ( j , l − 1 ) + W ( 2 ) S t ( j , l − 1 ) + b ) \hat S_t^{(j,l)}=f(W^{(1)}S_{t-s}^{(j,l-1)}+W^{(2)}S_{t}^{(j,l-1)}+b) S^t(j,l)=f(W(1)Sts(j,l1)+W(2)St(j,l1)+b)
非因果卷积:t时刻的预测可能是序列中任意时间步数据的函数
ED-TCN:
X ( t − d 2 ) X_{(t-\frac{d}{2})} X(t2d) X ( t + d 2 ) X_{(t+\frac{d}{2})} X(t+2d)进行卷积
Dilated-TCN:
S ^ t ( j , l ) = f ( W ( 1 ) S t − s ( j , l − 1 ) + W ( 2 ) S t ( j , l − 1 ) + W ( 3 ) S t + s ( j , l − 1 ) + b ) \hat S_t^{(j,l)}=f(W^{(1)}S_{t-s}^{(j,l-1)}+W^{(2)}S_{t}^{(j,l-1)}+W^{(3)}S_{t+s}^{(j,l-1)}+b) S^t(j,l)=f(W(1)Sts(j,l1)+W(2)St(j,l1)+W(3)St+s(j,l1)+b)

Experiments

采用F1同时评价Segmentation任务和Detection任务:
P = T P T P + F P , R = T P T P + F N P=\frac{TP}{TP+FP}, R=\frac{TP}{TP+FN} P=TP+FPTP,R=TP+FNTP
F 1 = 2 p r e c ∗ r e c a l l p r e c + r e c a l l F_1=2 \frac{prec*recall}{prec+recall} F1=2prec+recallprecrecall
如果IoU分数在阈值之上的话被认为是TP(True Positive),否则是FP(False Positive)
数据集:
在这里插入图片描述
在每个数据集上都实现了SOTA。
其余实验部分的细节大家可以查看原文

keras实现模型部分代码:

import numpy as np

from keras.models import Sequential, Model
from keras.layers import Input, Dense, TimeDistributed, merge, Lambda
from keras.layers.core import *
from keras.layers.convolutional import *
from keras.layers.recurrent import *

import tensorflow as tf
from keras import backend as K

from keras.activations import relu
from functools import partial

def channel_normalization(x):
    # Normalize by the highest activation最大值进行正则化
    max_values = K.max(K.abs(x), 2, keepdims=True) + 1e-5
    out = x / max_values
    return out


def WaveNet_activation(x):
    # WaveNet的激活函数
    tanh_out = Activation('tanh')(x)
    sigm_out = Activation('sigmoid')(x)
    return Merge(mode='mul')([tanh_out, sigm_out])


#  -------------------------------------------------------------
def ED_TCN(n_nodes, conv_len, n_classes, n_feat, max_len,
           loss='categorical_crossentropy', causal=False,
           optimizer="rmsprop", activation='norm_relu',
           return_param_str=False):
    n_layers = len(n_nodes)

    inputs = Input(shape=(max_len, n_feat))     # [T,F]
    model = inputs

    # ---- Encoder ----
    for i in range(n_layers):
        # Pad beginning of sequence to prevent usage of future data
        if causal: model = ZeroPadding1D((conv_len // 2, 0))(model)
        model = Convolution1D(n_nodes[i], conv_len, border_mode='same')(model)
        if causal: model = Cropping1D((0, conv_len // 2))(model)

        model = SpatialDropout1D(0.3)(model)

        if activation == 'norm_relu':
            model = Activation('relu')(model)
            model = Lambda(channel_normalization, name="encoder_norm_{}".format(i))(model)
        elif activation == 'wavenet':
            model = WaveNet_activation(model)
        else:
            model = Activation(activation)(model)

        model = MaxPooling1D(2)(model)

    # ---- Decoder ----
    for i in range(n_layers):
        model = UpSampling1D(2)(model)
        if causal: model = ZeroPadding1D((conv_len // 2, 0))(model)
        model = Convolution1D(n_nodes[-i - 1], conv_len, border_mode='same')(model)
        if causal: model = Cropping1D((0, conv_len // 2))(model)

        model = SpatialDropout1D(0.3)(model)

        if activation == 'norm_relu':
            model = Activation('relu')(model)
            model = Lambda(channel_normalization, name="decoder_norm_{}".format(i))(model)
        elif activation == 'wavenet':
            model = WaveNet_activation(model)
        else:
            model = Activation(activation)(model)

    # Output FC layer
    model = TimeDistributed(Dense(n_classes, activation="softmax"))(model)

    model = Model(input=inputs, output=model)
    model.compile(loss=loss, optimizer=optimizer, sample_weight_mode="temporal", metrics=['accuracy'])

    if return_param_str:
        param_str = "ED-TCN_C{}_L{}".format(conv_len, n_layers)
        if causal:
            param_str += "_causal"

        return model, param_str
    else:
        return model


def Dilated_TCN(num_feat, num_classes, nb_filters, dilation_depth, nb_stacks, max_len,
                activation="wavenet", tail_conv=1, use_skip_connections=True, causal=False,
                optimizer='adam', return_param_str=False):
    """
    dilation_depth : number of layers per stack
    nb_stacks : number of stacks.
    """

    def residual_block(x, s, i, activation):
        original_x = x

        if causal:
            x = ZeroPadding1D(((2 ** i) // 2, 0))(x)
            conv = AtrousConvolution1D(nb_filters, 2, atrous_rate=2 ** i, border_mode='same',
                                       name='dilated_conv_%d_tanh_s%d' % (2 ** i, s))(x)
            conv = Cropping1D((0, (2 ** i) // 2))(conv)
        else:
            conv = AtrousConvolution1D(nb_filters, 3, atrous_rate=2 ** i, border_mode='same',
                                       name='dilated_conv_%d_tanh_s%d' % (2 ** i, s))(x)

        conv = SpatialDropout1D(0.3)(conv)
        # x = WaveNet_activation(conv)

        if activation == 'norm_relu':
            x = Activation('relu')(conv)
            x = Lambda(channel_normalization)(x)
        elif activation == 'wavenet':
            x = WaveNet_activation(conv)
        else:
            x = Activation(activation)(conv)

            # res_x  = Convolution1D(nb_filters, 1, border_mode='same')(x)
        # skip_x = Convolution1D(nb_filters, 1, border_mode='same')(x)
        x = Convolution1D(nb_filters, 1, border_mode='same')(x)

        res_x = Merge(mode='sum')([original_x, x])

        # return res_x, skip_x
        return res_x, x

    input_layer = Input(shape=(max_len, num_feat))  # [T,F]

    skip_connections = []

    x = input_layer
    if causal:
        x = ZeroPadding1D((1, 0))(x)
        x = Convolution1D(nb_filters, 2, border_mode='same', name='initial_conv')(x)
        x = Cropping1D((0, 1))(x)
    else:
        x = Convolution1D(nb_filters, 3, border_mode='same', name='initial_conv')(x)

    for s in range(nb_stacks):
        for i in range(0, dilation_depth + 1):
            x, skip_out = residual_block(x, s, i, activation)
            skip_connections.append(skip_out)

    if use_skip_connections:
        x = Merge(mode='sum')(skip_connections)
    x = Activation('relu')(x)
    x = Convolution1D(nb_filters, tail_conv, border_mode='same')(x)
    x = Activation('relu')(x)
    x = Convolution1D(num_classes, tail_conv, border_mode='same')(x)
    x = Activation('softmax', name='output_softmax')(x)

    model = Model(input_layer, x)
    model.compile(optimizer, loss='categorical_crossentropy', sample_weight_mode='temporal')

    if return_param_str:
        param_str = "D-TCN_C{}_B{}_L{}".format(2, nb_stacks, dilation_depth)
        if causal:
            param_str += "_causal"

        return model, param_str
    else:
        return model

代码链接点击此处

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值