Q1: Image Captioning with Vanilla RNNs
作业代码已上传至我github: https://github.com/jingshuangliu22/cs231n,欢迎参考、讨论、指正。
LSTM_Captioning.ipynb
Microsoft COCO
idx_to_word <type 'list'> 1004
train_captions <type 'numpy.ndarray'> (400135, 17) int32
val_captions <type 'numpy.ndarray'> (195954, 17) int32
train_image_idxs <type 'numpy.ndarray'> (400135,) int32
val_features <type 'numpy.ndarray'> (40504, 512) float32
val_image_idxs <type 'numpy.ndarray'> (195954,) int32
train_features <type 'numpy.ndarray'> (82783, 512) float32
train_urls <type 'numpy.ndarray'> (82783,) |S63
val_urls <type 'numpy.ndarray'> (40504,) |S63
word_to_idx <type 'dict'> 1004
Look at the data
Vanilla RNN: step forward
next_h error: 6.29242142647e-09
Vanilla RNN: step backward
dx error: 6.88735954327e-11
dprev_h error: 5.28932394133e-10
dWx error: 1.12554920911e-10
dWh error: 4.84496557569e-10
db error: 2.72330774095e-11
Vanilla RNN: forward
h error: 7.72846618019e-08
Vanilla RNN: backward
dx error: 2.70104774724e-08
dh0 error: 1.7454525052e-09
dWx error: 3.40035760677e-10
dWh error: 2.01095678956e-09
db error: 3.23168709094e-10
Word embedding: forward
out error: 1.00000000947e-08
Word embedding: backward
Word embedding: backward
Temporal Affine layer
dx error: 4.98623200795e-11
dw error: 7.54622091734e-11
db error: 5.76987410469e-12
Temporal Softmax loss
2.30256439876
23.025705242
2.32606402665
dx error: 2.45647211476e-08
RNN for image captioning
loss: 9.83235591003
expected loss: 9.83235591003
difference: 2.61124455392e-12
W_embed relative error: 2.006287e-09
W_proj relative error: 2.435961e-09
W_vocab relative error: 2.411310e-09
Wh relative error: 2.055948e-08
Wx relative error: 3.195020e-07
b relative error: 1.777874e-09
b_proj relative error: 1.159276e-09
b_vocab relative error: 1.960674e-10
Overfit small data
(Iteration 1 / 100) loss: 82.463010
(Iteration 11 / 100) loss: 27.939999
(Iteration 21 / 100) loss: 8.880015
(Iteration 31 / 100) loss: 1.921411
(Iteration 41 / 100) loss: 0.639671
(Iteration 51 / 100) loss: 0.340682
(Iteration 61 / 100) loss: 0.287836
(Iteration 71 / 100) loss: 0.180632
(Iteration 81 / 100) loss: 0.187963
(Iteration 91 / 100) loss: 0.179619
Test-time sampling