Action-Recogtion paper-reading

欢迎使用Markdown编辑器

你好! 这是你第一次使用 Markdown编辑器 所展示的欢迎页。如果你想学习如何使用Markdown编辑器, 可以仔细阅读这篇文章,了解一下Markdown的基本语法知识。

新的改变

TSN(2016)
long-range temporal structure modeling

two-stream ConvNets+a sequence of short snippets+four types of input modalities,

the spatial stream ConvNet operates on a single RGB images, and the temporal stream ConvNet takes a stack of consecutive optical flow fields as input.

Instead of working on single frames or frame stacks, operate on a sequence of short snippets(K segments of equal durations) sparsely sampled from the entire video.Each snippet in this sequence will produce its own preliminary prediction of the action classes. Then a consensus among the snippets will be derived as the video-level prediction.

The segmental consensus function combines the outputs from multiple short snippets to obtain a consensus of class hypothesis among them,including evenly
averaging, maximum, and weighted averaging in our experiments

a single RGB image, stacked RGB difffference, stacked optical flow field, and stacked warped optical flow field.
RGB difffference between two consecutive frames describe the appearance change, which may correspond to the motion salient region.
extract the warped optical flow by first estimating homography matrix and then compensating camera motion. suppresses the background motion and makes motion concentrate on the actor.
the optical flflow is better at capturing motion information and sometimes RGB difffference may be unstable for describing motions. RGB difffference may serve as a low-quality, high-speed alternative for motion representations.
best: Optical Flow + Warped Flow + RGB(For the fusion take a weighted average of them.1:0.5:1)

a cross modality pre-training technique+two new data augmentation techniques: corner cropping(extracted regions are only selected from the corners or the center of the image) and scale jittering(fifix the size of input image or optical flflow fifields as 256×340, and the width and height of cropped region are randomly selected from {256, 224, 192, 168}. resized to 224 × 224 for network training.

I3D(2017)
Two-Stream Inflated 3D ConvNet
simply convert successful image (2D) classifification models into 3D ConvNets by starting with a 2D architecture, and inflflating all the fifilters and pooling kernels – endowing them with an additional temporal dimension. Filters N × N-> N × N × N. repeating the weights of the 2D fifilters N times along the time dimension, and rescaling them by dividingby N.

R(2+1)D(2018)
two new forms of spatiotemporal convolution.
mixed convolution (MC) , employing 3D convolutions only in the early layers of the network, with 2D conv in the top layers—motion modeling is a low/mid-level operation that can be implemented via 3D conv in the early layers of a network, and spatial reasoning over these mid-level motion features implemented by 2D convolutions in the top layers;
R(2+1)D, where we replace theNi 3D convolutional fifilters of sizeNi-1×t×d×d with a (2+1)D block consisting of Mi 2D convolutional filters of size
Ni -1 ×1×d×d andNi temporal convolutional fifilters of sizeMi×t×1×1. The hyperparameterMi determines the dimensionality of the intermediate subspace where the signal is projected between the spatial and temporal conv

two anvantages: Increase the numbers of nonlinearities ; easier optimization

Non-Local(2018)
non-local operations as a generic family of building blocks for capturing long-range dependencies.

classical non-local means method in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions.
The set of positions can be in space, time, or spacetime, implying that our operations are applicable for image, sequence, and video problems
In videos, long-range interactions occur between distant pixels in space as well as time.

Non-local matching is also the essence of successful texture synthesis ,super-resolution , and inpainting algorithms
A self-attention module computes the response at a position in a sequence (e.g., a sentence) by attending to all positions and taking their weighted average in an embedding space.
Both flow and trajectories are off-the-shelf modules that may find long-range, non-local dependency.

A non-local operation is a flflexible building block and can be easily used together with convolutional/recurrent layers. It can be added into the earlier part of deep neural networks, unlike fc layers that are often used in the end.

generic non-local operation in deep neural networks

g(x):For simplicity, g in the form of a linear embedding: g(xj) = Wgxj , where Wg is a weight matrix to be learned. This is implemented as, e.g., 1×1 convolution in space or 1×1×1 convolution in spacetime
f(x): Gaussian.
Embedded Gaussian
Dot product
Concatenation [·, ·] denotes concatenation

non-local block

residual connection,insert a new non-local block into any pre-trained model, without breaking its initial behavior
to make it more efficient: set the number of channels represented by Wg, Wθ, and Wφ to be half of the number of channels in x; subsampling trick (e.g.subsample x by pooing)

TSM(2019)

shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters.

Shift only a small portion of the channels for effificient temporal fusion instead of shifting all the channels. to cut down the data movement cost and more accurate( the performance reaches the peak when 1/4 (1/8 for each direction) of the channels are shifted).
Insert TSM inside residual branch rather than outside so that the activation of the current frame is preserved, which does not harm the spatial feature learning capability of the 2D CNN backbone.

SlowFast(2019)

a Slow pathway, operating at low frame rate, to capture spatial semantics, and (ii) a Fast path way, operating at high frame rate, to capture motion at fine temporal resolution. The Fast pathway can be made very lightweight by reducing its channel capacity, yet can learn useful temporal information for video recognition.
to treat spatial structures and temporal events separately; explored the potential of different temporal speeds

attach one lateral connection between the two pathways for every “stage". Specifically for ResNets, these connections are right after pool1, res2, res3, and res4.

use non-degenerate temporal convolutions (temporal kernel size > 1) only in res4 and res5; all fifilters from conv1 to res3 are essentially 2D convolution kernels in this pathway.This is motivated by our experimental observation that using temporal convolutions in earlier layers degrades accuracy.

TIN(2020)

This work is built upon the idea of fusing a single frame with neighboring frames in the channel dimension. Temporal Fusion is completed through: 1. Shifting groups of channels. 2. Temporal attention with the offsets and weights learned from target tasks. To help the symmetric flow of information in the temporal dimension, adopt the strategy of reverse offsets.

fuse temporal information along the temporal dimension through inserting the module before each convolutional layer in the residual block

3 steps: 1 splits the input channel-wise feature into several groups, obtaining the offsets and weights of neighboring frames to mingle the temporal information. 2 apply the learned offsets to their respective groups through shifting operation and also interpolate the shifted feature along with temporal dimension. 3, we concatenate the split features and temporal-wisely aggregate them with the learned weights

Deformable Shift Module:
firstly squeeze global spatial information into a temporal channel descriptor.(3D Average Pooling)

OffsetNet: pooling->1D conv to aggregate the channel info -> 2 fc+ReLU to aggregate temporal info

                rescale the raw offset to the range of (-T/2,T/2)

WeightNet: 1D conv+ sigmoid
Differentiable Temporal-wise Frame Sampling: the input feature map U is split into 2 parts along the channel dimension: one is to be shifted by different offsets according to different groupings, while the rest remains un-shifted.
If have n groups, only learn the offsets of half groups n/2 , and the remained half are symmetrically derived by the previous offsets.
when concatenate the split groups of channels to V , the feature map is multiplied by the weight E along the temporal dimension.
我们对Markdown编辑器进行了一些功能拓展与语法支持,除了标准的Markdown编辑器功能,我们增加了如下几点新功能,帮助你用它写博客:

  1. 全新的界面设计 ,将会带来全新的写作体验;
  2. 在创作中心设置你喜爱的代码高亮样式,Markdown 将代码片显示选择的高亮样式 进行展示;
  3. 增加了 图片拖拽 功能,你可以将本地的图片直接拖拽到编辑区域直接展示;
  4. 全新的 KaTeX数学公式 语法;
  5. 增加了支持甘特图的mermaid语法1 功能;
  6. 增加了 多屏幕编辑 Markdown文章功能;
  7. 增加了 焦点写作模式、预览模式、简洁写作模式、左右区域同步滚轮设置 等功能,功能按钮位于编辑区域与预览区域中间;
  8. 增加了 检查列表 功能。

功能快捷键

撤销:Ctrl/Command + Z
重做:Ctrl/Command + Y
加粗:Ctrl/Command + B
斜体:Ctrl/Command + I
标题:Ctrl/Command + Shift + H
无序列表:Ctrl/Command + Shift + U
有序列表:Ctrl/Command + Shift + O
检查列表:Ctrl/Command + Shift + C
插入代码:Ctrl/Command + Shift + K
插入链接:Ctrl/Command + Shift + L
插入图片:Ctrl/Command + Shift + G
查找:Ctrl/Command + F
替换:Ctrl/Command + G

合理的创建标题,有助于目录的生成

直接输入1次#,并按下space后,将生成1级标题。
输入2次#,并按下space后,将生成2级标题。
以此类推,我们支持6级标题。有助于使用TOC语法后生成一个完美的目录。

如何改变文本的样式

强调文本 强调文本

加粗文本 加粗文本

标记文本

删除文本

引用文本

H2O is是液体。

210 运算结果是 1024.

插入链接与图片

链接: link.

图片: Alt

带尺寸的图片: Alt

居中的图片: Alt

居中并且带尺寸的图片: Alt

当然,我们为了让用户更加便捷,我们增加了图片拖拽功能。

如何插入一段漂亮的代码片

博客设置页面,选择一款你喜欢的代码片高亮样式,下面展示同样高亮的 代码片.

// An highlighted block
var foo = 'bar';

生成一个适合你的列表

  • 项目
    • 项目
      • 项目
  1. 项目1
  2. 项目2
  3. 项目3
  • 计划任务
  • 完成任务

创建一个表格

一个简单的表格是这么创建的:

项目Value
电脑$1600
手机$12
导管$1

设定内容居中、居左、居右

使用:---------:居中
使用:----------居左
使用----------:居右

第一列第二列第三列
第一列文本居中第二列文本居右第三列文本居左

SmartyPants

SmartyPants将ASCII标点字符转换为“智能”印刷标点HTML实体。例如:

TYPEASCIIHTML
Single backticks'Isn't this fun?'‘Isn’t this fun?’
Quotes"Isn't this fun?"“Isn’t this fun?”
Dashes-- is en-dash, --- is em-dash– is en-dash, — is em-dash

创建一个自定义列表

Markdown
Text-to- HTML conversion tool
Authors
John
Luke

如何创建一个注脚

一个具有注脚的文本。2

注释也是必不可少的

Markdown将文本转换为 HTML

KaTeX数学公式

您可以使用渲染LaTeX数学表达式 KaTeX:

Gamma公式展示 Γ ( n ) = ( n − 1 ) ! ∀ n ∈ N \Gamma(n) = (n-1)!\quad\forall n\in\mathbb N Γ(n)=(n1)!nN 是通过欧拉积分

Γ ( z ) = ∫ 0 ∞ t z − 1 e − t d t   . \Gamma(z) = \int_0^\infty t^{z-1}e^{-t}dt\,. Γ(z)=0tz1etdt.

你可以找到更多关于的信息 LaTeX 数学表达式here.

新的甘特图功能,丰富你的文章

Mon 06 Mon 13 Mon 20 已完成 进行中 计划一 计划二 现有任务 Adding GANTT diagram functionality to mermaid
  • 关于 甘特图 语法,参考 这儿,

UML 图表

可以使用UML图表进行渲染。 Mermaid. 例如下面产生的一个序列图:

张三 李四 王五 你好!李四, 最近怎么样? 你最近怎么样,王五? 我很好,谢谢! 我很好,谢谢! 李四想了很长时间, 文字太长了 不适合放在一行. 打量着王五... 很好... 王五, 你怎么样? 张三 李四 王五

这将产生一个流程图。:

链接
长方形
圆角长方形
菱形
  • 关于 Mermaid 语法,参考 这儿,

FLowchart流程图

我们依旧会支持flowchart的流程图:

Created with Raphaël 2.2.0 开始 我的操作 确认? 结束 yes no
  • 关于 Flowchart流程图 语法,参考 这儿.

导出与导入

导出

如果你想尝试使用此编辑器, 你可以在此篇文章任意编辑。当你完成了一篇文章的写作, 在上方工具栏找到 文章导出 ,生成一个.md文件或者.html文件进行本地保存。

导入

如果你想加载一篇你写过的.md文件,在上方工具栏可以选择导入功能进行对应扩展名的文件导入,
继续你的创作。


  1. mermaid语法说明 ↩︎

  2. 注脚的解释 ↩︎

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
《Android移动应用开发》 实验指导书 课程代码: 总课时数: 适用专业: 院(系)名称: Android移动应用开发实验指导书全文共13页,当前为第1页。 Android移动应用开发实验指导书全文共13页,当前为第1页。 实验一深入理解Activity 目标 (1)掌握Activity的开发、配置和使用。 (2)掌握Intent的几种常用的属性。 (3)Android系统内置Intent的使用。 (4)了解Activity的生命周期 实验软、硬件环境 硬件:PC电脑一台; 配置:winxp或win7系统,内存大于4G,硬盘250G及以上 JDK1.7 、Eclipse、ADT、Android SDK 实验主要技术基础 (1)活动是Android的四大组件之一,它是一种可以包含用户界面的组件,主要用于和用户进行交互。 (2)Intent是Android程序中各组件之间进行交互的一种重要方式,它不仅可以指明当前组件想要执行的动作,还可以在不同组件之间传递数据。 任务 1、请在AndroidManifest.xml文件中配置SecondActivity: 配置Intent的Action属性为com.sise.intent.action.JHY_ACTION; 配置Category属性为com.sise.intent.category.JHY_CATEGORY。 通过隐式Intent的使用从FirstActivity启动SecondActivity,编写代码,运行程序,预期效果如下所示。 Android移动应用开发实验指导书全文共13页,当前为第2页。 Android移动应用开发实验指导书全文共13页,当前为第2页。 图1 程序运行初始化界面 图2 点击图1中的按钮后的运行结果 2、请使用显式Intent启动SecondActivity,并使用Intent从FirstActiv传递数据到SecondActivity。编写代码,运行程序,预期效果如下所示。 图1 程序运行初始化界面图2 点击图1中的按钮后的运行结果 3、使用Intent传递数据从SecondActivity返回数据到FirstActivity中去。编写代码,运行程序,预期效果如下所示。 图1 程序运行初始化界面图2 点击图1按钮运行结果 图3 点击图2按钮运行结果 实验方法与步骤 (1)创建活动 Activity是Android系统提供的一个活动基类所有的活动都必须直接或间接继承此类才能拥有活动的特性。 (2)布局文件 创建布局文件 加载布局文件 (3)在清单文件中注册活动 (4)以上方法完成多个活动的创建 Android移动应用开发实验指导书全文共13页,当前为第3页。(5)使用Intent完成多个活动之间的交互和数据传递 Android移动应用开发实验指导书全文共13页,当前为第3页。 实验二Android的UI界面开发 目标 (1)掌握常用UI组件的功能和用法。 (2)掌握四种基本的布局。 (3)掌握自定义控件的2种使用方法。 (4)掌握ListView的简单用法。 (5)掌握自定义ListView控件的使用。 实验软、硬件环境 硬件:PC电脑一台; 配置:winxp或win7系统,内存大于4G,硬盘250G及以上 JDK1.7 、Eclipse、ADT、Android SDK 实验主要技术基础 (1)Android的UI界面都是由View和ViewGroup及其派生类组合而成的。其中,View是所有UI组件的基类,而ViewGroup是容纳这些组件的容器。 (2)Android的布局方式有LinearLayout(线性布局)、FrameLayout(单帧布局)、RelativeLayout(相对布局)、TableLayout(表格布局)。 (3)Adapter是连接后端数据和前端显示的适配器接口,是数据和UI(View)之间一个重要的纽带。在常见的View(ListView,GridView)等地方都需要用到Adapter。 任务:制作精美聊天界面 使用自定义的ListView制作精美的聊天界面,编写代码,运行程序,预期效果如下所示。 Android移动应用开发实验指导书全文共13页,当前为第4页。 主界面布局示意图: Android移动应用开发实验指导书全文共13页,当前为第4页。 (要求画出) ListView子项布局示意图: (要求画出) ------------------------------------------------- 消息实体类创建: (主要代码) 自定义适配器类的创建: (主要代码) 主界面活动的代码: (主要代码) 实验方法与步骤 Step1:创建主界面布局文件。 Step2:创建ListView子项布局文件(可以使用And
移动开发技术(Android)——实验1Android开发环境搭建及简单程序实现 移动开发技术(Android)——实验1 Android开发环境搭建及简单程序实现 ⼀、实验⽬的 1.熟练掌握Android开发环境搭建步骤; 2.掌握Android应⽤程序开发的基本步骤; 3.掌握Android应⽤程序结构。 ⼆、实验内容 1.Android开发环境搭建,要求: (1)将服务器上的相关开发包下载到本机。 (2)按照步骤搭建Android开发环境。 (3)实验报告中要求⽤语⾔描述具体的搭建步骤,并附以截图做辅助说明。 2.创建⼀个Android项⽬,项⽬名称为"shiyan0101_⾃我介绍",要求: (1)界⾯构成:⾄少5个⽂本框,分别⽤于显⽰学⽣的姓名、性别、出⽣年⽉、专业班级、爱好等内容。 (2)实验报告中要求⽤语⾔描述具体的实验步骤,并附以截图做辅助说明。 3.创建⼀个Android项⽬,项⽬名称为"shiyan0102_⽤户登录",要求: (1)界⾯构成:2个⽂本框,分别⽤于显⽰"⽤户名:"、"密码",2个编辑框⽤户接收输⼊的⽤户名和密码,1个按钮,⽂本显⽰"登 录"。 (2)实验报告中要求⽤语⾔描述具体的实验步骤,并附以截图做辅助说明。 三、实验过程 1.⾸先解压缩eclipse⾄D盘,然后在eclipse⽂件夹下的dropins⽂件⾥创建⼀个名为ADT的⽂件夹,将ADT-22.0.5压缩包中的features和 plugins复制到ADT中。 打开eclipse,选择菜单栏Window—>preferences—>在弹出的页⾯中选择Android选项—>SDK Location—>在右侧单击Browse按钮, 浏览到之前解压的SDK根⽬录,并单击"确定"按钮,然后再单击Apply和OK按钮。 2.创建⼀个Android项⽬,项⽬名称为"shiyan0101_⾃我介绍",要求: 界⾯构成:⾄少5个⽂本框,分别⽤于显⽰学⽣的姓名、性别、出⽣年⽉、专业班级、爱好等内容。 (1)在eclipse中创建⼀个Android Application Project,项⽬名称:shiyan0101_⾃我介绍 (2)在XML⽂件的GraphicLayout中⽤⿏标拖动5个TextView控件,分别设置text内容分别为name、sex、Birthday、Depertmant、 Hobby;⽤⿏标拖动5个EditText空间,分别放于5个TextView控件的后⾯。 (3)运⾏run——Andorid Application,程序结果截图如下 3.创建⼀个Android项⽬,项⽬名称为"shiyan0102_⽤户登录",要求: 界⾯构成:2个⽂本框,分别⽤于显⽰"⽤户名:"、"密码",2个编辑框⽤户接收输⼊的⽤户名和密码,1个按钮,⽂本显⽰"登录"。 (1)在ECLIPSE中创建⼀个Android Application Project,项⽬名称:shiyan0102_⽤户登录 (2)在XML⽂件的GraphicLayout中⽤⿏标拖动2个TextView控件,分别设置text内容分别为⽤户名、密码;⽤⿏标拖动2个EditText空 间,分别放于前两个个TextView控件的后⾯;⽤⿏标拖动⼀个button放于前者的下⾯,设置text内容为登陆。 (3)运⾏run——Andorid Application,程序结果截图如下。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值