Ballas, Nicolas, et al. “Delving Deeper into Convolutional Networks for Learning Video Representations.” arXiv preprint arXiv:1511.06432 (2015). (Citaions: 14).
tion, they tend to have a low-spatial resolution. Thus, we argue that current RCN architectures are not well suited for capturing fine motion information. Instead, they are more
convolutional maps, inevitably results in a drastic number of parameters characterizing the input-to-hidden transformation due to the convolutional maps size. On the other hand,
1 Motivation
Previous works on Recurrent CNNs has tended to focus on high-level features extracted from the 2D CNN top-layers. High-level features contain highly discriminative informa-tion, they tend to have a low-spatial resolution. Thus, we argue that current RCN architectures are not well suited for capturing fine motion information. Instead, they are more
likely focus on global appearance changes.
convolutional maps, inevitably results in a drastic number of parameters characterizing the input-to-hidden transformation due to the convolutional maps size. On the other hand,