【Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition】

摘要部分:开头一句话指出motion representation在人体动作识别中起着至关重要的作用,然后就直接切入自己的模型的简介,本文引入了一个新的紧凑的motion representation,叫做optical flow guided feature(OFF)。OFF是从optical flow的定义中发源而来的,并且和optical flow正交。之后又介绍了一些其他OFF的特征,最后简介了一下实验情况。

既然是叫optical flow guided feature,又和optical flow正交,那么就要先介绍一下optical flow。传统的optical flow当中有一个很著名的限制,叫做brightness constant constraint

在这里插入图片描述
这里面I表示像素值,x,y,t分别是空间两个维度和时间一个维度,△x,y,t表示的是很小的变化量。而对于feature level,也就是从原始像素值中用函数得到的feature,也有类似的表述
在这里插入图片描述
也就是不仅亮度在时空上具有连续性,对于坐标值的小改变是不变的,由亮度值计算出来的feature也是具有这种稳定性的。用p=(x,y,t)来表示一个位置,那么将上式可改写为
在这里插入图片描述
如果两边都除以△t,那么就得到
在这里插入图片描述
这里面的vx和vy两个东西物理意义暂不明确,被称作volecity of feature point at p,feature点的速度,而对于x,y,t求的偏导数也就是对空间和时间求的梯度。

网络结构的概览可以用下图表述
在这里插入图片描述
整个网络有三个子网络,,三个子网络各有各的用处。分别是feature generation sub-network,OFF sub-network,classification sub-network。feature generation sub-network就用普通的CNN网络抓取出一些基础的feature,然后OFF sub-network再从中抓取出OFF feature,接着,这些feature会输入到一些堆叠在一起的residual block之中进一步获得更加精细的feature,这两个sub-network输出的feature再输入到classification sub-network得到最终的分类结果,更加精细的网络结构图可以见下图
在这里插入图片描述
首先,基础的feature f(I)是用加了ReLU和max-pooling的卷积网络抓取出来的,网络结构选择的是BN-Inception,这个feature generation network也可以用其他网络结构代替。OFF sub-network包含多个OFF unit,不同的unit使用f(I)不同深度的特征,就像图中所示。每一个OFF unit包含一个OFF layer,用来生成OFF,生成OFF的过程其实可以用下图表示
在这里插入图片描述
首先输入的feature map的每一个位置都会使用一个1×1的卷积核将feature的channel数变成所需的固定大小,本文中是无论输入多少都变成128维的。之后用sobel算子计算空间维度上的梯度,用相邻帧对应位置像素值相减获得时间维度的梯度,这些梯度都计算完了就是得到了OFF了,将他们连接在一起并且连接上上一个level输出的lower level OFF feature,之后输入到residual network中,sobel算子其实很简单,如下所示
在这里插入图片描述
这个是分别生成x和y方向的梯度数据,也就是一个固定权重的kernel,用来计算像素值的差值。

在连接不同OFF unit的residual block中,OFF的dimension还会进一步减小,节约计算资源。residual block使用的是ResNet-20,不使用batchnorm,作者声称是为了避免过拟合。此外,OFF unit其实是可以加到一般的CNN layer中来辅助模型的。

最后是classification sub-network,classification sub-network将不同来源(指的应该是figure3中的三个score,但是有一点很疑惑,就是t和t+△t不都是原video吗?这两个的输出score有啥区别呢)feature拿来,分别使用inner-product classifier得到相应的classification score,对于所有sampled frames得到的classification score(这里说的又像是从video中提出的每一帧feature map都要计算一个相应的classification score)通过取平均值的方式合并在一起。这里其实不是很清楚最终的分类结果到底是怎么输出的,之后作者又介绍了,采取和TSN一样的设置,video中不是每一个frame都参与计算的,是要抽样的,抽取出来的每一个frame对应一个segment(一般来讲segment不一定是frame),每一个segment都会输出一个class score,对于OFF sub-network的各个segment的输出score通过average pooling来得到一个sub-network level的score,为了获得video-level的score,还需要考虑feature generation sub-network的输出score,也可以采取同样的average pooling的方法进行处理。此外,本文的feature generation sub-network和OFF sub-network是分开训练的,第一阶段是用已有的手段训练feature generation sub-network,第二阶段是固定feature generation sub-network,然后训练OFF sub-network。

BNInception_OFF的模型结构如下所示:

from RGB_OFF import BNInception_OFF
import torch
import torch.nn as nn
from thop import profile


model = BNInception_OFF()

print(model)

input = torch.randn(1, 3, 224, 224)
flops, params = profile(model, inputs=(input, ))



#print(flops)
#print(params)

输出如下

C:\Users\24093\anaconda3\envs\tuduipytorch\python.exe D:/Optical-Flow-Guided-Feature-Pytorch/my_test.py
BNInception_OFF(
  (consensus): ConsensusModule()
  (conv1_7x7_s2): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
  (conv1_7x7_s2_bn): BatchNorm2d(64, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (conv1_relu_7x7): ReLU(inplace=True)
  (pool1_3x3_s2): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=0, dilation=(1, 1), ceil_mode=True)
  (conv2_3x3_reduce): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
  (conv2_3x3_reduce_bn): BatchNorm2d(64, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (conv2_relu_3x3_reduce): ReLU(inplace=True)
  (conv2_3x3): Conv2d(64, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2_3x3_bn): BatchNorm2d(192, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (conv2_relu_3x3): ReLU(inplace=True)
  (pool2_3x3_s2): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=0, dilation=(1, 1), ceil_mode=True)
  (inception_3a_1x1): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1))
  (inception_3a_1x1_bn): BatchNorm2d(64, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3a_relu_1x1): ReLU(inplace=True)
  (inception_3a_3x3_reduce): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1))
  (inception_3a_3x3_reduce_bn): BatchNorm2d(64, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3a_relu_3x3_reduce): ReLU(inplace=True)
  (inception_3a_3x3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_3a_3x3_bn): BatchNorm2d(64, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3a_relu_3x3): ReLU(inplace=True)
  (inception_3a_double_3x3_reduce): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1))
  (inception_3a_double_3x3_reduce_bn): BatchNorm2d(64, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3a_relu_double_3x3_reduce): ReLU(inplace=True)
  (inception_3a_double_3x3_1): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_3a_double_3x3_1_bn): BatchNorm2d(96, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3a_relu_double_3x3_1): ReLU(inplace=True)
  (inception_3a_double_3x3_2): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_3a_double_3x3_2_bn): BatchNorm2d(96, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3a_relu_double_3x3_2): ReLU(inplace=True)
  (inception_3a_pool): AvgPool2d(kernel_size=3, stride=1, padding=1)
  (inception_3a_pool_proj): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1))
  (inception_3a_pool_proj_bn): BatchNorm2d(32, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3a_relu_pool_proj): ReLU(inplace=True)
  (inception_3b_1x1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
  (inception_3b_1x1_bn): BatchNorm2d(64, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3b_relu_1x1): ReLU(inplace=True)
  (inception_3b_3x3_reduce): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
  (inception_3b_3x3_reduce_bn): BatchNorm2d(64, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3b_relu_3x3_reduce): ReLU(inplace=True)
  (inception_3b_3x3): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_3b_3x3_bn): BatchNorm2d(96, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3b_relu_3x3): ReLU(inplace=True)
  (inception_3b_double_3x3_reduce): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
  (inception_3b_double_3x3_reduce_bn): BatchNorm2d(64, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3b_relu_double_3x3_reduce): ReLU(inplace=True)
  (inception_3b_double_3x3_1): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_3b_double_3x3_1_bn): BatchNorm2d(96, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3b_relu_double_3x3_1): ReLU(inplace=True)
  (inception_3b_double_3x3_2): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_3b_double_3x3_2_bn): BatchNorm2d(96, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3b_relu_double_3x3_2): ReLU(inplace=True)
  (inception_3b_pool): AvgPool2d(kernel_size=3, stride=1, padding=1)
  (inception_3b_pool_proj): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
  (inception_3b_pool_proj_bn): BatchNorm2d(64, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3b_relu_pool_proj): ReLU(inplace=True)
  (inception_3c_3x3_reduce): Conv2d(320, 128, kernel_size=(1, 1), stride=(1, 1))
  (inception_3c_3x3_reduce_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3c_relu_3x3_reduce): ReLU(inplace=True)
  (inception_3c_3x3): Conv2d(128, 160, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (inception_3c_3x3_bn): BatchNorm2d(160, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3c_relu_3x3): ReLU(inplace=True)
  (inception_3c_double_3x3_reduce): Conv2d(320, 64, kernel_size=(1, 1), stride=(1, 1))
  (inception_3c_double_3x3_reduce_bn): BatchNorm2d(64, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3c_relu_double_3x3_reduce): ReLU(inplace=True)
  (inception_3c_double_3x3_1): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_3c_double_3x3_1_bn): BatchNorm2d(96, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3c_relu_double_3x3_1): ReLU(inplace=True)
  (inception_3c_double_3x3_2): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (inception_3c_double_3x3_2_bn): BatchNorm2d(96, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_3c_relu_double_3x3_2): ReLU(inplace=True)
  (inception_3c_pool): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=0, dilation=(1, 1), ceil_mode=True)
  (inception_4a_1x1): Conv2d(576, 224, kernel_size=(1, 1), stride=(1, 1))
  (inception_4a_1x1_bn): BatchNorm2d(224, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4a_relu_1x1): ReLU(inplace=True)
  (inception_4a_3x3_reduce): Conv2d(576, 64, kernel_size=(1, 1), stride=(1, 1))
  (inception_4a_3x3_reduce_bn): BatchNorm2d(64, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4a_relu_3x3_reduce): ReLU(inplace=True)
  (inception_4a_3x3): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_4a_3x3_bn): BatchNorm2d(96, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4a_relu_3x3): ReLU(inplace=True)
  (inception_4a_double_3x3_reduce): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1))
  (inception_4a_double_3x3_reduce_bn): BatchNorm2d(96, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4a_relu_double_3x3_reduce): ReLU(inplace=True)
  (inception_4a_double_3x3_1): Conv2d(96, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_4a_double_3x3_1_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4a_relu_double_3x3_1): ReLU(inplace=True)
  (inception_4a_double_3x3_2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_4a_double_3x3_2_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4a_relu_double_3x3_2): ReLU(inplace=True)
  (inception_4a_pool): AvgPool2d(kernel_size=3, stride=1, padding=1)
  (inception_4a_pool_proj): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1))
  (inception_4a_pool_proj_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4a_relu_pool_proj): ReLU(inplace=True)
  (inception_4b_1x1): Conv2d(576, 192, kernel_size=(1, 1), stride=(1, 1))
  (inception_4b_1x1_bn): BatchNorm2d(192, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4b_relu_1x1): ReLU(inplace=True)
  (inception_4b_3x3_reduce): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1))
  (inception_4b_3x3_reduce_bn): BatchNorm2d(96, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4b_relu_3x3_reduce): ReLU(inplace=True)
  (inception_4b_3x3): Conv2d(96, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_4b_3x3_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4b_relu_3x3): ReLU(inplace=True)
  (inception_4b_double_3x3_reduce): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1))
  (inception_4b_double_3x3_reduce_bn): BatchNorm2d(96, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4b_relu_double_3x3_reduce): ReLU(inplace=True)
  (inception_4b_double_3x3_1): Conv2d(96, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_4b_double_3x3_1_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4b_relu_double_3x3_1): ReLU(inplace=True)
  (inception_4b_double_3x3_2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_4b_double_3x3_2_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4b_relu_double_3x3_2): ReLU(inplace=True)
  (inception_4b_pool): AvgPool2d(kernel_size=3, stride=1, padding=1)
  (inception_4b_pool_proj): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1))
  (inception_4b_pool_proj_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4b_relu_pool_proj): ReLU(inplace=True)
  (inception_4c_1x1): Conv2d(576, 160, kernel_size=(1, 1), stride=(1, 1))
  (inception_4c_1x1_bn): BatchNorm2d(160, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4c_relu_1x1): ReLU(inplace=True)
  (inception_4c_3x3_reduce): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1))
  (inception_4c_3x3_reduce_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4c_relu_3x3_reduce): ReLU(inplace=True)
  (inception_4c_3x3): Conv2d(128, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_4c_3x3_bn): BatchNorm2d(160, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4c_relu_3x3): ReLU(inplace=True)
  (inception_4c_double_3x3_reduce): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1))
  (inception_4c_double_3x3_reduce_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4c_relu_double_3x3_reduce): ReLU(inplace=True)
  (inception_4c_double_3x3_1): Conv2d(128, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_4c_double_3x3_1_bn): BatchNorm2d(160, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4c_relu_double_3x3_1): ReLU(inplace=True)
  (inception_4c_double_3x3_2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_4c_double_3x3_2_bn): BatchNorm2d(160, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4c_relu_double_3x3_2): ReLU(inplace=True)
  (inception_4c_pool): AvgPool2d(kernel_size=3, stride=1, padding=1)
  (inception_4c_pool_proj): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1))
  (inception_4c_pool_proj_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4c_relu_pool_proj): ReLU(inplace=True)
  (inception_4d_1x1): Conv2d(608, 96, kernel_size=(1, 1), stride=(1, 1))
  (inception_4d_1x1_bn): BatchNorm2d(96, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4d_relu_1x1): ReLU(inplace=True)
  (inception_4d_3x3_reduce): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1))
  (inception_4d_3x3_reduce_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4d_relu_3x3_reduce): ReLU(inplace=True)
  (inception_4d_3x3): Conv2d(128, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_4d_3x3_bn): BatchNorm2d(192, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4d_relu_3x3): ReLU(inplace=True)
  (inception_4d_double_3x3_reduce): Conv2d(608, 160, kernel_size=(1, 1), stride=(1, 1))
  (inception_4d_double_3x3_reduce_bn): BatchNorm2d(160, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4d_relu_double_3x3_reduce): ReLU(inplace=True)
  (inception_4d_double_3x3_1): Conv2d(160, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_4d_double_3x3_1_bn): BatchNorm2d(192, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4d_relu_double_3x3_1): ReLU(inplace=True)
  (inception_4d_double_3x3_2): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_4d_double_3x3_2_bn): BatchNorm2d(192, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4d_relu_double_3x3_2): ReLU(inplace=True)
  (inception_4d_pool): AvgPool2d(kernel_size=3, stride=1, padding=1)
  (inception_4d_pool_proj): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1))
  (inception_4d_pool_proj_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4d_relu_pool_proj): ReLU(inplace=True)
  (inception_4e_3x3_reduce): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1))
  (inception_4e_3x3_reduce_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4e_relu_3x3_reduce): ReLU(inplace=True)
  (inception_4e_3x3): Conv2d(128, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (inception_4e_3x3_bn): BatchNorm2d(192, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4e_relu_3x3): ReLU(inplace=True)
  (inception_4e_double_3x3_reduce): Conv2d(608, 192, kernel_size=(1, 1), stride=(1, 1))
  (inception_4e_double_3x3_reduce_bn): BatchNorm2d(192, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4e_relu_double_3x3_reduce): ReLU(inplace=True)
  (inception_4e_double_3x3_1): Conv2d(192, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_4e_double_3x3_1_bn): BatchNorm2d(256, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4e_relu_double_3x3_1): ReLU(inplace=True)
  (inception_4e_double_3x3_2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (inception_4e_double_3x3_2_bn): BatchNorm2d(256, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_4e_relu_double_3x3_2): ReLU(inplace=True)
  (inception_4e_pool): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=0, dilation=(1, 1), ceil_mode=True)
  (inception_5a_1x1): Conv2d(1056, 352, kernel_size=(1, 1), stride=(1, 1))
  (inception_5a_1x1_bn): BatchNorm2d(352, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5a_relu_1x1): ReLU(inplace=True)
  (inception_5a_3x3_reduce): Conv2d(1056, 192, kernel_size=(1, 1), stride=(1, 1))
  (inception_5a_3x3_reduce_bn): BatchNorm2d(192, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5a_relu_3x3_reduce): ReLU(inplace=True)
  (inception_5a_3x3): Conv2d(192, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_5a_3x3_bn): BatchNorm2d(320, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5a_relu_3x3): ReLU(inplace=True)
  (inception_5a_double_3x3_reduce): Conv2d(1056, 160, kernel_size=(1, 1), stride=(1, 1))
  (inception_5a_double_3x3_reduce_bn): BatchNorm2d(160, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5a_relu_double_3x3_reduce): ReLU(inplace=True)
  (inception_5a_double_3x3_1): Conv2d(160, 224, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_5a_double_3x3_1_bn): BatchNorm2d(224, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5a_relu_double_3x3_1): ReLU(inplace=True)
  (inception_5a_double_3x3_2): Conv2d(224, 224, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_5a_double_3x3_2_bn): BatchNorm2d(224, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5a_relu_double_3x3_2): ReLU(inplace=True)
  (inception_5a_pool): AvgPool2d(kernel_size=3, stride=1, padding=1)
  (inception_5a_pool_proj): Conv2d(1056, 128, kernel_size=(1, 1), stride=(1, 1))
  (inception_5a_pool_proj_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5a_relu_pool_proj): ReLU(inplace=True)
  (inception_5b_1x1): Conv2d(1024, 352, kernel_size=(1, 1), stride=(1, 1))
  (inception_5b_1x1_bn): BatchNorm2d(352, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5b_relu_1x1): ReLU(inplace=True)
  (inception_5b_3x3_reduce): Conv2d(1024, 192, kernel_size=(1, 1), stride=(1, 1))
  (inception_5b_3x3_reduce_bn): BatchNorm2d(192, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5b_relu_3x3_reduce): ReLU(inplace=True)
  (inception_5b_3x3): Conv2d(192, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_5b_3x3_bn): BatchNorm2d(320, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5b_relu_3x3): ReLU(inplace=True)
  (inception_5b_double_3x3_reduce): Conv2d(1024, 192, kernel_size=(1, 1), stride=(1, 1))
  (inception_5b_double_3x3_reduce_bn): BatchNorm2d(192, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5b_relu_double_3x3_reduce): ReLU(inplace=True)
  (inception_5b_double_3x3_1): Conv2d(192, 224, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_5b_double_3x3_1_bn): BatchNorm2d(224, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5b_relu_double_3x3_1): ReLU(inplace=True)
  (inception_5b_double_3x3_2): Conv2d(224, 224, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inception_5b_double_3x3_2_bn): BatchNorm2d(224, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5b_relu_double_3x3_2): ReLU(inplace=True)
  (inception_5b_pool): MaxPool2d(kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), dilation=(1, 1), ceil_mode=True)
  (inception_5b_pool_proj): Conv2d(1024, 128, kernel_size=(1, 1), stride=(1, 1))
  (inception_5b_pool_proj_bn): BatchNorm2d(128, eps=1e-05, momentum=0.9, affine=True, track_running_stats=True)
  (inception_5b_relu_pool_proj): ReLU(inplace=True)
  (global_pool): AvgPool2d(kernel_size=7, stride=1, padding=0)
  (last_linear): Linear(in_features=1024, out_features=1000, bias=True)
  (motion_conv_gen_3a): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_down_3a): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_grad_3a): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)
  (motion_conv_gen_3b): Conv2d(320, 128, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_down_3b): Conv2d(320, 32, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_grad_3b): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)
  (motion_conv_gen_3c): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_down_3c): Conv2d(576, 32, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_grad_3c): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)
  (motion_conv_trans_28): Conv2d(320, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
  (motion_conv1_trans_28a): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
  (motion_conv2_trans_28a): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (motion_conv3_trans_28a): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
  (motion_conv_branch_28a): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
  (motion_conv1_trans_28b): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
  (motion_conv2_trans_28b): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (motion_conv3_trans_28b): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
  (motion_conv1_trans_28c): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
  (motion_conv2_trans_28c): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (motion_conv3_trans_28c): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
  (motion_conv_gen_4a): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_down_4a): Conv2d(576, 32, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_grad_4a): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)
  (motion_conv_gen_4b): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_down_4b): Conv2d(576, 32, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_grad_4b): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)
  (motion_conv_gen_4c): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_down_4c): Conv2d(608, 32, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_grad_4c): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)
  (motion_conv_gen_4d): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_down_4d): Conv2d(608, 32, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_grad_4d): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)
  (motion_conv_trans_14): Conv2d(1056, 128, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
  (motion_conv1_trans_14a): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
  (motion_conv2_trans_14a): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (motion_conv3_trans_14a): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
  (motion_conv_expand_trans_14a): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
  (motion_conv1_trans_14b): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
  (motion_conv2_trans_14b): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (motion_conv3_trans_14b): Conv2d(128, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (motion_conv_gen_5a): Conv2d(1024, 128, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_down_5a): Conv2d(1024, 32, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_grad_5a): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)
  (motion_conv_gen_5b): Conv2d(1024, 128, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_down_5b): Conv2d(1024, 32, kernel_size=(1, 1), stride=(1, 1))
  (motion_spatial_grad_5b): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32)
  (motion_conv_trans): Conv2d(832, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (motion_conv1_trans): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
  (motion_conv2_trans): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (motion_conv3_trans): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1))
  (motion_conv_branch_trans): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1))
  (fc_action_motion): Linear(in_features=1024, out_features=101, bias=True)
  (fc_action_motion_28): Linear(in_features=256, out_features=101, bias=True)
  (fc_action_motion_14): Linear(in_features=512, out_features=101, bias=True)
  (motion_relu_gen_3a): ReLU()
  (motion_relu_gen_3b): ReLU()
  (motion_relu_gen_3c): ReLU()
  (motion_relu_trans_28a): ReLU()
  (motion_relu_trans_28b): ReLU()
  (motion_relu_trans_28c): ReLU()
  (motion_relu_gen_4a): ReLU()
  (motion_relu_gen_4b): ReLU()
  (motion_relu_gen_4c): ReLU()
  (motion_relu_gen_4d): ReLU()
  (motion_relu_trans_14): ReLU()
  (motion_relu_gen_5a): ReLU()
  (motion_relu_gen_5b): ReLU()
  (motion_relu_trans_7): ReLU()
  (motion_pool_trans_28): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=0, dilation=(1, 1), ceil_mode=True)
  (dropout): Dropout(p=0.8, inplace=False)
  (softmax): Softmax(dim=None)
)
[INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>.
[INFO] Register count_normalization() for <class 'torch.nn.modules.batchnorm.BatchNorm2d'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.activation.ReLU'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.pooling.MaxPool2d'>.
[INFO] Register count_avgpool() for <class 'torch.nn.modules.pooling.AvgPool2d'>.
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.dropout.Dropout'>.
[INFO] Register count_softmax() for <class 'torch.nn.modules.activation.Softmax'>.

Process finished with exit code 0

计算BNInception_OFF的flops和params如下所示:

from RGB_OFF import BNInception_OFF
import torch
import torch.nn as nn
from thop import profile


model = BNInception_OFF()

#print(model)

input = torch.randn(1, 3, 224, 224)
flops, params = profile(model, inputs=(input, ))



print(flops)
print(params)

输出如下所示:

C:\Users\24093\anaconda3\envs\tuduipytorch\python.exe D:/Optical-Flow-Guided-Feature-Pytorch/my_test.py
[INFO] Register count_convNd() for <class 'torch.nn.modules.conv.Conv2d'>.
[INFO] Register count_normalization() for <class 'torch.nn.modules.batchnorm.BatchNorm2d'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.activation.ReLU'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.pooling.MaxPool2d'>.
[INFO] Register count_avgpool() for <class 'torch.nn.modules.pooling.AvgPool2d'>.
[INFO] Register count_linear() for <class 'torch.nn.modules.linear.Linear'>.
[INFO] Register zero_ops() for <class 'torch.nn.modules.dropout.Dropout'>.
[INFO] Register count_softmax() for <class 'torch.nn.modules.activation.Softmax'>.
2048408096.0
11295240.0

Process finished with exit code 0

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值