Highlight
- Two stream: spatial + temporal (optical flow).
- Use a motion network pre-trained on optical flow to predict OF and also learn end-to-end in training phase.
- Fusion of motion and spatial features
- Multiloss: siamese reid and classification loss.
Model
- Structure of the whole model:
- Structure of motion network (pre-trained on LK or Epic optical flow):
- Structure of spatial network:
- Different spatial fusion method: concatenate, sum, max
- Different spatial fusion position: @ any layer in spatial network
- Motion context accumulation: via RNN (not LSTM in this paper)
- Multiloss: siamese (distance) loss + classification (softmax)
- Pre-train motion network on optical flow: smoothed L-1 loss (
l=1,2,3
representing optical flow estimation with different resolutions)
- L(l)(motion)(