这里解释一下1-6,
x
t
x_t
xt是LSTM cell 在t时间的输入
然后
i
t
i_t
it,
f
t
f_t
ft,
o
t
o_t
ot,
g
t
g_t
gt分别是input gate, forget gate, output gate和 input modulation gate at time t.
这堆
c
t
c_t
ct是memory cell, 跟前一帧的memory cell的相关
h
t
h_t
ht是中间变量hidden state
This stage is designed to model person-level actions and their temporal evolution
The second LSTM network, working on top of the temporal representation, is used to directly model the temporal dynamics of group activity.
这样先对每个人把第一层的LSTM结果和AlexNet fc7 的feature(在图中没有画出来) 连起来, 然后做maxpooling, 送到第二层的LSTM来预测group activity