实验结果
- 本次实验,假设各个变量相互独立使用控制变量的方法进行测试(实际情况看来假设不成立= =)。由于实验结果具有不可重复性,所以对每组参数都进行了两次测试
学习率的影响
1 | 0.05 | 32 | 16 | 2 | 10 | 0.9521 |
| | 32 | 16 | 2 | 10 | 0.9487 |
2 | 0.1 | 32 | 16 | 2 | 10 | 0.9548 |
| | 32 | 16 | 2 | 10 | 0.9469 |
3 | 0.2 | 32 | 16 | 2 | 10 | 0.9532 |
| | 32 | 16 | 2 | 10 | 0.9537 |
4 | 0.5 | 32 | 16 | 2 | 10 | 0.9532 |
| | 32 | 16 | 2 | 10 | 0.9563 |
- 结论:有以上几组测试看来学习率并无明显变化,并且过低的学习率会使训练时间明显变长并且得到的结果可能是局部最优。
epoch的影响
1 | 0.2 | 32 | 16 | 2 | 10 | 0.9532 |
| | 32 | 16 | 2 | 10 | 0.9537 |
2 | 0.2 | 32 | 16 | 5 | 10 | 0.96 |
| | 32 | 16 | 5 | 10 | 0.9621 |
3 | 0.2 | 32 | 16 | 10 | 10 | 0.96 |
| | 32 | 16 | 10 | 10 | 0.9628 |
4 | 0.2 | 32 | 16 | 15 | 10 | 0.9668 |
| | 32 | 16 | 15 | 10 | 0.966 |
- 结论:随着epoch次数的提高准确率变高了!但是训练时间也随之等比例增加
batch_size的影响
1 | 0.2 | 32 | 16 | 2 | 2 | 0.9204 |
| | 32 | 16 | 2 | 2 | 0.9243 |
2 | 0.2 | 32 | 16 | 2 | 5 | 0.9496 |
| | 32 | 16 | 2 | 5 | 0.9435 |
3 | 0.2 | 32 | 16 | 2 | 10 | 0.9565 |
| | 32 | 16 | 2 | 10 | 0.9513 |
4 | 0.2 | 32 | 16 | 2 | 15 | 0.9529 |
| | 32 | 16 | 2 | 15 | 0.9575 |
- 结论:batch_size过大或者过小都不是理想的选择,10正合适
隐层神经元个数的影响
1 | 0.2 | 32 | 16 | 2 | 10 | 0.9486 |
| | 32 | 16 | 2 | 10 | 0.9559 |
2 | 0.2 | 64 | 32 | 2 | 10 | 0.963 |
| | 64 | 32 | 2 | 10 | 0.966 |
3 | 0.2 | 128 | 16 | 2 | 10 | 0.9611 |
| | 128 | 16 | 2 | 10 | 0.9684 |
4 | 0.2 | 128 | 64 | 2 | 10 | 0.9622 |
| | 128 | 64 | 2 | 10 | 0.9607 |
5 | 0.2 | 100 | 20 | 2 | 10 | 0.9652 |
| | 100 | 20 | 2 | 10 | 0.9646 |
- 结论:这个参量就比较玄学了,几组结果相差不过,不过神经元个数越多,训练速度也越慢。看起来100-20是个不错的选择
根据测试结果猜测的最优组合
- learning_rate = 0.2
- epoch = 20
- batch_size =10
- n_hidden1 = 100
- n_hidden2 = 20
- accuracy = 0.9812
转载于:https://www.cnblogs.com/JiaoYh98/p/10750984.html