Long-term Recurrent Convolutional Networks for Visual Recognition and Description
概要
视频识别,利用并行的CNN网络训练具有一定序列的图片,得到对应的图形特征的提取,作为输入,导入LSTM网络中,最终获得输出,对应于相应的描述。
LRCN(Long-term Recurrent Convolutional Network)
对于LSTM网络部分:
h
1
=
f
W
(
x
1
,
h
0
)
=
f
W
(
x
1
,
0
)
h_{1}=f_{W}\left(x_{1}, h_{0}\right)=f_{W}\left(x_{1}, 0\right)
h1=fW(x1,h0)=fW(x1,0)
h 2 = f W ( x 2 , h 1 ) h_{2}=f_{W}\left(x_{2}, h_{1}\right) h2=fW(x2,h1)
P ( y t = c ) = exp ( W z c z t , c + b c ) ∑ c ′ ∈ C exp ( W z c z t , c ′ + b c ) P\left(y_{t}=c\right)=\frac{\exp \left(W_{z c} z_{t, c}+b_{c}\right)}{\sum_{c^{\prime} \in C} \exp \left(W_{z c} z_{t, c^{\prime}}+b_{c}\right)} P(yt=c)=∑c′∈Cexp(Wzczt,c′+bc)exp(Wzczt,c+bc)
L ( V , W ) = − ∑ t = 1 T log P V , W ( y t ∣ x 1 : t , y 1 : t − 1 ) \mathcal{L}(V, W)=-\sum_{t=1}^{T} \log P_{V, W}\left(y_{t} | x_{1 : t}, y_{1 : t-1}\right) L(V,W)=−t=1∑TlogPV,W(yt∣x1:t,y1:t−1)
网络框架