113人阅读 评论(0)

# 背景

Regularization parameter: λ

η=10.0, λ=1000.0

import mnist_loader
import network2
net = network2.Network([784, 30, 10])
net.SGD(training_data, 30, 10, 10.0, lmbda = 1000.0,evaluation_data=validation_data, monitor_evaluation_accuracy=True)

Epoch 0 training complete
Accuracy on evaluation data: 1030 / 10000

Epoch 1 training complete
Accuracy on evaluation data: 990 / 10000

Epoch 2 training complete
Accuracy on evaluation data: 1009 / 10000

...

Epoch 27 training complete
Accuracy on evaluation data: 1009 / 10000

Epoch 28 training complete
Accuracy on evaluation data: 983 / 10000

Epoch 29 training complete
Accuracy on evaluation data: 967 / 10000

• 神经网络结构: 层数, 每层神经元个数多少
• 初始化w和b的方法
• Cost函数
• Regularization: L1, L2
• Sigmoid输出还是Softmax?
• 使用Droput?
• 训练集大小
• mini-batch size
• 学习率(learning rate): η
• Regularization parameter: λ

# 总体策略:

## 对于学习率(learning rate): η

net = network2.Network([784, 10])
net.SGD(training_data[:1000], 30, 10, 10.0, lmbda = 1000.0, evaluation_data=validation_data[:100],  monitor_evaluation_accuracy=True)
Epoch 0 training complete
Accuracy on evaluation data: 10 / 100

Epoch 1 training complete
Accuracy on evaluation data: 10 / 100

Epoch 2 training complete
Accuracy on evaluation data: 10 / 100

λ之前设置为1000, 因为减少了训练集的数量, λ为了保证weight decay一样,对应的减少λ = 20.0

net = network2.Network([784, 10])
net.SGD(training_data[:1000], 30, 10, 10.0, lmbda = 20.0, \evaluation_data=validation_data[:100], monitor_evaluation_accuracy=True)
Epoch 0 training complete
Accuracy on evaluation data: 12 / 100

Epoch 1 training complete
Accuracy on evaluation data: 14 / 100

Epoch 2 training complete
Accuracy on evaluation data: 25 / 100

Epoch 3 training complete
Accuracy on evaluation data: 18 / 100

net = network2.Network([784, 10])
net.SGD(training_data[:1000], 30, 10, 100.0, lmbda = 20.0, \
... evaluation_data=validation_data[:100], \
... monitor_evaluation_accuracy=True)
Epoch 0 training complete
Accuracy on evaluation data: 10 / 100

Epoch 1 training complete
Accuracy on evaluation data: 10 / 100

Epoch 2 training complete
Accuracy on evaluation data: 10 / 100

Epoch 3 training complete
Accuracy on evaluation data: 10 / 100

net = network2.Network([784, 10])
net.SGD(training_data[:1000], 30, 10, 1.0, lmbda = 20.0, \
... evaluation_data=validation_data[:100], \
... monitor_evaluation_accuracy=True)
结果好很多:
Epoch 0 training complete
Accuracy on evaluation data: 62 / 100

Epoch 1 training complete
Accuracy on evaluation data: 42 / 100

Epoch 2 training complete
Accuracy on evaluation data: 43 / 100

Epoch 3 training complete
Accuracy on evaluation data: 61 / 100

## 自动搜索:

eg:学习率, 可以从0.001, 0.01, 0.1, 1, 10选择；λ,从1.0, 10, 100

## 随机梯度下降其他变种:

• Hessian 优化
• Momentum-based gradient descent

## 除了sigmoid, 其他人工神经网络的模型

（1）tanh

tanh 只是一个重新调节过度量后的 sigmoid函数

tanh:[-1,1]
sigmoid:[0,1]
tanh-1 到 1 之间, 不像 sigmoid 在 0, 1 之间, 所以输入要转化到-1, 1之间

（2）还有一种rectified linear 神经元:
max(0,w⋅x+b)

个人资料
等级：
访问量： 27万+
积分： 4112
排名： 9106
联系方式
博客专栏
 JavaEE专业技能 文章：84篇 阅读：168872 编程体系相关基础 文章：54篇 阅读：61405 Python机器学习、深度学习与数据分析 文章：37篇 阅读：53636