【video analysis in deep learning】

图像数据集ImageNet

视频数据集: UCF-101

视频表示学习早期广泛使用的方法是手工特征的提取(Hand-Crafted feature)

这类方法有着四大明显缺点:

  1. 对于相机运动和光照变化较为敏感

  2. 不包含高层语义信息

  3. 特征维度太高

  4. 计算太耗时

早期基于深度学习的视频表示是基于2D卷积神经网络(2D-CNN)

Paper1: Large-scale Video Classification with Convolutional Neural Networks

Dataset: The Sports-1M dataset consists of 1 million YouTube videos annotated with 487 classes. 

An effective approach to speeding up the runtime performance of CNNs is to modify the architecture to contain two separate streams of processing: a context stream that learns features on low-resolution frames and a high-resolution fovea stream that only operates on the middle portion of the frame.

Red, green and blue boxes indicate convolutional, normalization and pooling layers respectively. 

Multiresolution CNN architecture. Input frames are fed into two separate streams of processing: a context stream that models low-resolution image and a fovea stream that processes high-resolution center crop. Both streams consist of alternating convolution (red), normalization (green) and pooling (blue) layers. Both streams converge to two fully connected layers (yellow).

 the context stream learns more color features while the high-resolution fovea stream learns high frequency grayscale filters.

Conclusion: 1. two separate streams of processing: a context stream that models low-resolution image and a fovea stream that processes high-resolution center crop.

2. Slow fusion

Paper2: Two-Stream Convolutional Networks for Action Recognition in Videos

 The temporal part, in the form of motion across the frames, conveys the movement of the observer (the camera) and the objects.

the input to our model is formed by stacking optical flow displacement fields between several consecutive frames. Such input
explicitly describes the motion between video frames, 

Conclusion: 将视频分帧送入第一个卷积神经网络进行训练来提取静态特征,同时将从视频中提取出的光流图送进另外一个卷积神经网络来提取动态特征。最终将两个网络softmax层输出的分值进行一个融合。

Paper3: Long-term Recurrent Convolutional Networks for Visual Recognitionand Description

Long-term Recurrent Convolutional Networks (LRCNs): combines convolutional layers and long-range temporal recursion and is end-to-end trainable.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值