Q3: Dropout
作业代码已上传至我github: https://github.com/jingshuangliu22/cs231n,欢迎参考、讨论、指正。
Dropout.ipynb
X_val: (1000, 3, 32, 32)
X_train: (49000, 3, 32, 32)
X_test: (1000, 3, 32, 32)
y_val: (1000,)
y_train: (49000,)
y_test: (1000,)
Dropout forward pass
Running tests with p = 0.3
Mean of input: 10.0029862212
Mean of train-time output: 10.0180516238
Mean of test-time output: 10.0029862212
Fraction of train-time output set to zero: 0.699532
Fraction of test-time output set to zero: 0.0
Running tests with p = 0.6
Mean of input: 10.0029862212
Mean of train-time output: 10.0146605666
Mean of test-time output: 10.0029862212
Fraction of train-time output set to zero: 0.399216
Fraction of test-time output set to zero: 0.0
Running tests with p = 0.75
Mean of input: 10.0029862212
Mean of train-time output: 10.0041925077
Mean of test-time output: 10.0029862212
Fraction of train-time output set to zero: 0.249896
Fraction of test-time output set to zero: 0.0
Dropout backward pass
dx relative error: 5.44561222172e-11
Fully-connected nets with Dropout
Running check with dropout = 0
Initial loss: 2.31027832193
W1 relative error: 3.70e-06
W2 relative error: 8.95e-06
W3 relative error: 3.00e-08
b1 relative error: 2.10e-08
b2 relative error: 1.83e-09
b3 relative error: 9.60e-11
Running check with dropout = 0.25
Initial loss: 2.2995556198
W1 relative error: 2.61e-07
W2 relative error: 1.89e-09
W3 relative error: 4.52e-09
b1 relative error: 3.71e-10
b2 relative error: 4.50e-10
b3 relative error: 1.34e-10
Running check with dropout = 0.5
Initial loss: 2.30021447314
W1 relative error: 5.59e-07
W2 relative error: 4.28e-08
W3 relative error: 9.85e-08
b1 relative error: 2.54e-09
b2 relative error: 4.08e-09
b3 relative error: 6.62e-11
Regularization experiment
0
(Iteration 1 / 125) loss: 9.163244
(Epoch 0 / 25) train acc: 0.216000; val_acc: 0.192000
(Epoch 1 / 25) train acc: 0.236000; val_acc: 0.146000
(Epoch 2 / 25) train acc: 0.344000; val_acc: 0.209000
(Epoch 3 / 25) train acc: 0.360000; val_acc: 0.234000
(Epoch 4 / 25) train acc: 0.480000; val_acc: 0.248000
(Epoch 5 / 25) train acc: 0.570000; val_acc: 0.256000
(Epoch 6 / 25) train acc: 0.628000; val_acc: 0.281000
(Epoch 7 / 25) train acc: 0.682000; val_acc: 0.271000
(Epoch 8 / 25) train acc: 0.724000; val_acc: 0.267000
(Epoch 9 / 25) train acc: 0.800000; val_acc: 0.267000
(Epoch 10 / 25) train acc: 0.814000; val_acc: 0.273000
(Epoch 11 / 25) train acc: 0.836000; val_acc: 0.274000
(Epoch 12 / 25) train acc: 0.898000; val_acc: 0.296000
(Epoch 13 / 25) train acc: 0.908000; val_acc: 0.274000
(Epoch 14 / 25) train acc: 0.900000; val_acc: 0.280000
(Epoch 15 / 25) train acc: 0.956000; val_acc: 0.286000
(Epoch 16 / 25) train acc: 0.948000; val_acc: 0.264000
(Epoch 17 / 25) train acc: 0.962000; val_acc: 0.283000
(Epoch 18 / 25) train acc: 0.976000; val_acc: 0.287000
(Epoch 19 / 25) train acc: 0.984000; val_acc: 0.288000
(Epoch 20 / 25) train acc: 0.966000; val_acc: 0.272000
(Iteration 101 / 125) loss: 0.219312
(Epoch 21 / 25) train acc: 0.972000; val_acc: 0.298000
(Epoch 22 / 25) train acc: 0.976000; val_acc: 0.289000
(Epoch 23 / 25) train acc: 0.994000; val_acc: 0.283000
(Epoch 24 / 25) train acc: 0.994000; val_acc: 0.289000
(Epoch 25 / 25) train acc: 0.982000; val_acc: 0.287000
0.75
(Iteration 1 / 125) loss: 10.888994
(Epoch 0 / 25) train acc: 0.224000; val_acc: 0.202000
(Epoch 1 / 25) train acc: 0.300000; val_acc: 0.231000
(Epoch 2 / 25) train acc: 0.314000; val_acc: 0.220000
(Epoch 3 / 25) train acc: 0.404000; val_acc: 0.259000
(Epoch 4 / 25) train acc: 0.408000; val_acc: 0.217000
(Epoch 5 / 25) train acc: 0.478000; val_acc: 0.235000
(Epoch 6 / 25) train acc: 0.586000; val_acc: 0.275000
(Epoch 7 / 25) train acc: 0.634000; val_acc: 0.254000
(Epoch 8 / 25) train acc: 0.680000; val_acc: 0.300000
(Epoch 9 / 25) train acc: 0.748000; val_acc: 0.303000
(Epoch 10 / 25) train acc: 0.796000; val_acc: 0.268000
(Epoch 11 / 25) train acc: 0.870000; val_acc: 0.282000
(Epoch 12 / 25) train acc: 0.856000; val_acc: 0.285000
(Epoch 13 / 25) train acc: 0.880000; val_acc: 0.282000
(Epoch 14 / 25) train acc: 0.918000; val_acc: 0.315000
(Epoch 15 / 25) train acc: 0.906000; val_acc: 0.303000
(Epoch 16 / 25) train acc: 0.932000; val_acc: 0.290000
(Epoch 17 / 25) train acc: 0.942000; val_acc: 0.311000
(Epoch 18 / 25) train acc: 0.966000; val_acc: 0.296000
(Epoch 19 / 25) train acc: 0.938000; val_acc: 0.307000
(Epoch 20 / 25) train acc: 0.966000; val_acc: 0.313000
(Iteration 101 / 125) loss: 2.588185
(Epoch 21 / 25) train acc: 0.964000; val_acc: 0.297000
(Epoch 22 / 25) train acc: 0.968000; val_acc: 0.296000
(Epoch 23 / 25) train acc: 0.984000; val_acc: 0.337000
(Epoch 24 / 25) train acc: 0.984000; val_acc: 0.323000
(Epoch 25 / 25) train acc: 0.968000; val_acc: 0.317000
Question
Explain what you see in this experiment. What does it suggest about dropout?
Answer
When using 0.75 dropout, we can get higher val accuracy since the dropout can conquer the overfitting problem.