Q4: Two-Layer Neural Network
作业代码已上传至我github: https://github.com/jingshuangliu22/cs231n,欢迎参考、讨论、指正。
two_layer_net.ipynb
Forward pass: compute scores
Your scores:
[[-0.81233741 -1.27654624 -0.70335995]
[-0.17129677 -1.18803311 -0.47310444]
[-0.51590475 -1.01354314 -0.8504215 ]
[-0.15419291 -0.48629638 -0.52901952]
[-0.00618733 -0.12435261 -0.15226949]]
correct scores:
[[-0.81233741 -1.27654624 -0.70335995]
[-0.17129677 -1.18803311 -0.47310444]
[-0.51590475 -1.01354314 -0.8504215 ]
[-0.15419291 -0.48629638 -0.52901952]
[-0.00618733 -0.12435261 -0.15226949]]
Difference between your scores and correct scores:
3.68027204961e-08
Forward pass: compute loss
Difference between your loss and correct loss:
1.79412040779e-13
Backward pass
W1 max relative error: 3.669858e-09
W2 max relative error: 3.440708e-09
b2 max relative error: 3.865039e-11
b1 max relative error: 1.125423e-09
Final training loss: 0.0171496079387
Load the data
Train data shape: (49000, 3072)
Train labels shape: (49000,)
Validation data shape: (1000, 3072)
Validation labels shape: (1000,)
Test data shape: (1000, 3072)
Test labels shape: (1000,)
Train a network
iteration 0 / 1000: loss 2.302970
iteration 100 / 1000: loss 2.302474
iteration 200 / 1000: loss 2.297076
iteration 300 / 1000: loss 2.257328
iteration 400 / 1000: loss 2.230484
iteration 500 / 1000: loss 2.150620
iteration 600 / 1000: loss 2.080736
iteration 700 / 1000: loss 2.054914
iteration 800 / 1000: loss 1.979290
iteration 900 / 1000: loss 2.039101
Validation accuracy: 0.287
Debug the training
Tune your hyperparameters
best_net = None # store the best model into this
hidden_size_choice = [400]
learning_rate_choice = [3e-3]
reg_choice = [0.02, 0.05, 0.1]
batch_size_choice =[500]
num_iters_choice = [1200]
best_acc = -1
best_stats = None
input_size = 32 * 32 * 3
for batch_size_curr in batch_size_choice:
for reg_cur in reg_choice:
for learning_rate_curr in learning_rate_choice:
for hidden_size_curr in hidden_size_choice:
for num_iters_curr in num_iters_choice:
print
print "current training hidden_size:",hidden_size_curr
print "current training learning_rate:",learning_rate_curr
print "current training reg:",reg_cur
print "current training batch_size:",batch_size_curr
net = TwoLayerNet(input_size, hidden_size_curr, num_classes)
stats = net.train(X_train, y_train, X_val, y_val,
num_iters=num_iters_curr, batch_size=batch_size_curr,
learning_rate=learning_rate_curr, learning_rate_decay=0.95,
reg=reg_cur, verbose=True)
val_acc = (net.predict(X_val) == y_val).mean()
print "current val_acc:",val_acc
if val_acc>best_acc:
best_acc = val_acc
best_net = net
best_stats = stats
print
print "best_acc:",best_acc
print "best hidden_size:",best_net.params['W1'].shape[1]
print "best learning_rate:",best_net.hyper_params['learning_rate']
print "best reg:",best_net.hyper_params['reg']
print "best batch_size:",best_net.hyper_params['batch_size']
#################################################################################
# TODO: Tune hyperparameters using the validation set. Store your best trained #
# model in best_net. #
# #
# To help debug your network, it may help to use visualizations similar to the #
# ones we used above; these visualizations will have significant qualitative #
# differences from the ones we saw above for the poorly tuned network. #
# #
# Tweaking hyperparameters by hand can be fun, but you might find it useful to #
# write code to sweep through possible combinations of hyperparameters #
# automatically like we did on the previous exercises. #
#################################################################################
pass
#################################################################################
# END OF YOUR CODE #
#################################################################################
current training hidden_size: 400
current training learning_rate: 0.003
current training reg: 0.02
current training batch_size: 500
iteration 0 / 1200: loss 2.302670
iteration 100 / 1200: loss 1.685716
iteration 200 / 1200: loss 1.599757
iteration 300 / 1200: loss 1.385544
iteration 400 / 1200: loss 1.479385
iteration 500 / 1200: loss 1.466029
iteration 600 / 1200: loss 1.456854
iteration 700 / 1200: loss 1.309732
iteration 800 / 1200: loss 1.236479
iteration 900 / 1200: loss 1.221071
iteration 1000 / 1200: loss 1.210234
iteration 1100 / 1200: loss 1.123294
current val_acc: 0.5
best_acc: 0.5
best hidden_size: 400
best learning_rate: 0.003
best reg: 0.02
best batch_size: 500
current training hidden_size: 400
current training learning_rate: 0.003
current training reg: 0.05
current training batch_size: 500
iteration 0 / 1200: loss 2.302935
iteration 100 / 1200: loss 1.693358
iteration 200 / 1200: loss 1.509740
iteration 300 / 1200: loss 1.572148
iteration 400 / 1200: loss 1.495700
iteration 500 / 1200: loss 1.400046
iteration 600 / 1200: loss 1.370000
iteration 700 / 1200: loss 1.249708
iteration 800 / 1200: loss 1.305766
iteration 900 / 1200: loss 1.342539
iteration 1000 / 1200: loss 1.277757
iteration 1100 / 1200: loss 1.232157
current val_acc: 0.512
best_acc: 0.512
best hidden_size: 400
best learning_rate: 0.003
best reg: 0.05
best batch_size: 500
current training hidden_size: 400
current training learning_rate: 0.003
current training reg: 0.1
current training batch_size: 500
iteration 0 / 1200: loss 2.303187
iteration 100 / 1200: loss 1.815929
iteration 200 / 1200: loss 1.736408
iteration 300 / 1200: loss 1.503271
iteration 400 / 1200: loss 1.571691
iteration 500 / 1200: loss 1.474189
iteration 600 / 1200: loss 1.478976
iteration 700 / 1200: loss 1.355830
iteration 800 / 1200: loss 1.261623
iteration 900 / 1200: loss 1.272220
iteration 1000 / 1200: loss 1.303129
iteration 1100 / 1200: loss 1.320341
current val_acc: 0.517
best_acc: 0.517
best hidden_size: 400
best learning_rate: 0.003
best reg: 0.1
best batch_size: 500