制作一个二分类网络分类minst 0和3
激活函数分别使用tanh和sigmoid,用交叉对比固定收敛标准多次测量取平均值的办法比较两个激活函数到底有什么差异。
数据1:tanh 每个收敛标准收敛199次,共25*199次
tanh | ||||||||
*03 | ||||||||
f2[0] | f2[1] | 迭代次数n | 平均准确率p-ave | δ | 耗时ms/次 | 耗时ms/199次 | 耗时 min/199 | 最大准确率p-max |
0.500353 | 0.415556 | 29.09045 | 0.715959 | 0.5 | 14.76884 | 2947 | 0.049117 | 0.962312 |
0.651631 | 0.357993 | 44.50754 | 0.754413 | 0.4 | 13.86432 | 2761 | 0.046017 | 0.956281 |
0.70054 | 0.253659 | 63.36683 | 0.890238 | 0.3 | 14.0804 | 2810 | 0.046833 | 0.966834 |
0.622577 | 0.2358 | 82.18593 | 0.94784 | 0.2 | 14.61307 | 2919 | 0.04865 | 0.973869 |
0.632443 | 0.245715 | 117.2714 | 0.959779 | 0.1 | 14.76884 | 2946 | 0.0491 | 0.975879 |
0.468252 | 0.522757 | 1335.035 | 0.985069 | 0.01 | 28.49749 | 5679 | 0.09465 | 0.98995 |
0.537243 | 0.461893 | 9637.683 | 0.987717 | 0.001 | 124.8844 | 24853 | 0.414217 | 0.991457 |
0.512175 | 0.487092 | 10134.37 | 0.987957 | 9.00E-04 | 129.3568 | 25751 | 0.429183 | 0.990955 |
0.517224 | 0.482056 | 11665.16 | 0.987493 | 8.00E-04 | 147.7688 | 29406 | 0.4901 | 0.991457 |
0.637769 | 0.361572 | 14006.98 | 0.987526 | 7.00E-04 | 174.0452 | 34643 | 0.577383 | 0.991457 |
0.703132 | 0.29634 | 16579.93 | 0.986488 | 6.00E-04 | 203.3015 | 40464 | 0.6744 | 0.991457 |
0.738377 | 0.261153 | 22576.23 | 0.987076 | 5.00E-04 | 271.6533 | 54066 | 0.9011 | 0.99196 |
0.78867 | 0.210962 | 33005.37 | 0.986882 | 4.00E-04 | 391.1156 | 77833 | 1.297217 | 0.99196 |
0.783714 | 0.216036 | 47255.99 | 0.987662 | 3.00E-04 | 554.0352 | 110268 | 1.8378 | 0.993467 |
0.688321 | 0.311501 | 66911.05 | 0.989281 | 2.00E-04 | 778.6281 | 154971 | 2.58285 | 0.993467 |
0.592911 | 0.407003 | 116691.6 | 0.991109 | 1.00E-04 | 994.2563 | 197857 | 3.297617 | 0.995477 |
0.557746 | 0.442177 | 115008.1 | 0.99124 | 9.00E-05 | 1326.709 | 264022 | 4.400367 | 0.994472 |
0.623077 | 0.376864 | 119430.7 | 0.991139 | 8.00E-05 | 1373.698 | 273374 | 4.556233 | 0.994975 |
0.55273 | 0.44721 | 133684.6 | 0.991619 | 7.00E-05 | 1522.899 | 303059 | 5.050983 | 0.994975 |
0.648208 | 0.351737 | 142422.2 | 0.991806 | 6.00E-05 | 1594.905 | 317394 | 5.2899 | 0.994975 |
0.63314 | 0.366819 | 161331.8 | 0.992121 | 5.00E-05 | 1797.372 | 357682 | 5.961367 | 0.994975 |
0.60802 | 0.391948 | 184704.5 | 0.992344 | 4.00E-05 | 2078.271 | 413591 | 6.893183 | 0.995477 |
0.638177 | 0.3618 | 205468.7 | 0.992558 | 3.00E-05 | 2290.075 | 455731 | 7.595517 | 0.994975 |
0.57788 | 0.422105 | 233099.6 | 0.993018 | 2.00E-05 | 2604.116 | 518227 | 8.637117 | 0.995477 |
0.567836 | 0.432158 | 290422.3 | 0.993541 | 1.00E-05 | 3223.593 | 641500 | 10.69167 | 0.996482 |
数据2:sigmoid共测量了43*199次
sig | ||||||||
*03 | ||||||||
f2[0] | f2[1] | 迭代次数n | 平均准确率p-ave | δ | 耗时ms/次 | 耗时ms/199次 | 耗时 min/199 | 最大准确率p-max |
0.502547669 | 0.498213 | 19.13065 | 0.518881 | 0.5 | 8.175879 | 1627 | 0.027117 | 0.866834 |
0.552177615 | 0.447849 | 300.397 | 0.954814 | 0.4 | 9.432161 | 1877 | 0.031283 | 0.973869 |
0.668711431 | 0.331904 | 374.1256 | 0.968738 | 0.3 | 9.899497 | 1986 | 0.0331 | 0.980402 |
0.478841824 | 0.520571 | 449.7286 | 0.977006 | 0.2 | 10.68844 | 2143 | 0.035717 | 0.984925 |
0.124091467 | 0.875997 | 552.6935 | 0.982579 | 0.1 | 11.45226 | 2279 | 0.037983 | 0.984925 |
0.32489657 | 0.675131 | 1213.266 | 0.985134 | 0.01 | 16.34673 | 3269 | 0.054483 | 0.986935 |
0.241692045 | 0.758309 | 3918.683 | 0.986884 | 0.001 | 37.64824 | 7508 | 0.125133 | 0.990452 |
0.226593223 | 0.773408 | 4302.819 | 0.987265 | 9.00E-04 | 41.21106 | 8201 | 0.136683 | 0.990452 |
0.19643095 | 0.803571 | 4589.744 | 0.98746 | 8.00E-04 | 42.98995 | 8555 | 0.142583 | 0.990452 |
0.151206651 | 0.848793 | 5202.563 | 0.988023 | 7.00E-04 | 48.0804 | 9586 | 0.159767 | 0.990452 |
0.136085504 | 0.863914 | 5801.864 | 0.988137 | 6.00E-04 | 53.1005 | 10582 | 0.176367 | 0.990452 |
0.15107437 | 0.848925 | 6836.291 | 0.988031 | 5.00E-04 | 61.19598 | 12193 | 0.203217 | 0.990452 |
0.186158149 | 0.813842 | 7983.03 | 0.987397 | 4.00E-04 | 70.04523 | 13939 | 0.232317 | 0.990452 |
0.306637127 | 0.693364 | 10110.83 | 0.986793 | 3.00E-04 | 86.18593 | 17167 | 0.286117 | 0.989447 |
0.306606847 | 0.693393 | 15261.92 | 0.986586 | 2.00E-04 | 130.3869 | 25956 | 0.4326 | 0.989447 |
0.613044571 | 0.386955 | 36494.64 | 0.987493 | 1.00E-04 | 292.9497 | 58297 | 0.971617 | 0.991457 |
0.698459049 | 0.301541 | 38622.99 | 0.987106 | 9.00E-05 | 308.3266 | 61357 | 1.022617 | 0.991457 |
0.693438295 | 0.306562 | 41566.63 | 0.987473 | 8.00E-05 | 332.3216 | 66132 | 1.1022 | 0.991457 |
0.723588776 | 0.276411 | 44855.03 | 0.987897 | 7.00E-05 | 357.6131 | 71197 | 1.186617 | 0.99196 |
0.683396475 | 0.316604 | 48059.4 | 0.988366 | 6.00E-05 | 383.4271 | 76302 | 1.2717 | 0.992462 |
0.688424754 | 0.311575 | 52975.13 | 0.988556 | 5.00E-05 | 420.8693 | 83753 | 1.395883 | 0.992462 |
0.688428142 | 0.311572 | 56542.35 | 0.98944 | 4.00E-05 | 448.0352 | 89174 | 1.486233 | 0.992965 |
0.718580776 | 0.281419 | 60788.42 | 0.989667 | 3.00E-05 | 481.995 | 95917 | 1.598617 | 0.993467 |
0.723609701 | 0.27639 | 67787.28 | 0.990753 | 2.00E-05 | 536.4774 | 106759 | 1.779317 | 0.99397 |
0.618088243 | 0.381912 | 79939.21 | 0.991601 | 1.00E-05 | 631.8844 | 125760 | 2.096 | 0.994975 |
0.623113546 | 0.376886 | 82122.32 | 0.992013 | 9.00E-06 | 649.0854 | 129184 | 2.153067 | 0.994472 |
0.673364278 | 0.326636 | 83016.99 | 0.991781 | 8.00E-06 | 656.4523 | 130634 | 2.177233 | 0.994975 |
0.60301373 | 0.396986 | 86061.2 | 0.992119 | 7.00E-06 | 681.9698 | 135728 | 2.262133 | 0.994975 |
0.55778833 | 0.442212 | 87802.62 | 0.992025 | 6.00E-06 | 692.598 | 137842 | 2.297367 | 0.994975 |
0.597989092 | 0.402011 | 91195.31 | 0.992056 | 5.00E-06 | 731.9347 | 145660 | 2.427667 | 0.994975 |
0.597989245 | 0.402011 | 94757.74 | 0.992525 | 4.00E-06 | 742.6985 | 147803 | 2.463383 | 0.994975 |
0.618089828 | 0.38191 | 99007.35 | 0.992397 | 3.00E-06 | 774.4724 | 154125 | 2.56875 | 0.994975 |
0.603014704 | 0.396985 | 106014.5 | 0.992432 | 2.00E-06 | 831.4573 | 165469 | 2.757817 | 0.995477 |
0.572864206 | 0.427136 | 117696.3 | 0.993195 | 1.00E-06 | 511.6231 | 101821 | 1.697017 | 0.996482 |
0.643215857 | 0.356784 | 119205.3 | 0.993467 | 9.00E-07 | 933.6683 | 185810 | 3.096833 | 0.99598 |
0.562813988 | 0.437186 | 122532.5 | 0.993563 | 8.00E-07 | 961.1608 | 191280 | 3.188 | 0.99598 |
0.542713524 | 0.457286 | 125011.9 | 0.993773 | 7.00E-07 | 980.6734 | 195156 | 3.2526 | 0.99598 |
0.542713525 | 0.457286 | 127423.5 | 0.993609 | 6.00E-07 | 999.4171 | 198894 | 3.3149 | 0.996482 |
0.572864262 | 0.427136 | 130846.4 | 0.993735 | 5.00E-07 | 1026.085 | 204197 | 3.403283 | 0.996985 |
0.562814031 | 0.437186 | 133950.5 | 0.993881 | 4.00E-07 | 1049.749 | 208907 | 3.481783 | 0.99598 |
0.557788917 | 0.442211 | 140273.8 | 0.993927 | 3.00E-07 | 1099.955 | 218895 | 3.64825 | 0.996985 |
0.507537689 | 0.492462 | 148654.8 | 0.994197 | 2.00E-07 | 1164.583 | 231759 | 3.86265 | 0.996482 |
0.48743719 | 0.512563 | 163688.8 | 0.99452 | 1.00E-07 | 1281.653 | 255059 | 4.250983 | 0.996482 |
在相同收敛标准下比较迭代次数
δ | tanh | sig | tanh/sig |
0.5 | 29.09045 | 19.13065 | 1.52062 |
0.4 | 44.50754 | 300.397 | 0.148162 |
0.3 | 63.36683 | 374.1256 | 0.169373 |
0.2 | 82.18593 | 449.7286 | 0.182746 |
0.1 | 117.2714 | 552.6935 | 0.212182 |
0.01 | 1335.035 | 1213.266 | 1.100364 |
0.001 | 9637.683 | 3918.683 | 2.459419 |
9.00E-04 | 10134.37 | 4302.819 | 2.355287 |
8.00E-04 | 11665.16 | 4589.744 | 2.541571 |
7.00E-04 | 14006.98 | 5202.563 | 2.692323 |
6.00E-04 | 16579.93 | 5801.864 | 2.85769 |
5.00E-04 | 22576.23 | 6836.291 | 3.302408 |
4.00E-04 | 33005.37 | 7983.03 | 4.134442 |
3.00E-04 | 47255.99 | 10110.83 | 4.6738 |
2.00E-04 | 66911.05 | 15261.92 | 4.384182 |
1.00E-04 | 116691.6 | 36494.64 | 3.1975 |
9.00E-05 | 115008.1 | 38622.99 | 2.97771 |
8.00E-05 | 119430.7 | 41566.63 | 2.873235 |
7.00E-05 | 133684.6 | 44855.03 | 2.980371 |
6.00E-05 | 142422.2 | 48059.4 | 2.963461 |
5.00E-05 | 161331.8 | 52975.13 | 3.045426 |
4.00E-05 | 184704.5 | 56542.35 | 3.266658 |
3.00E-05 | 205468.7 | 60788.42 | 3.380063 |
2.00E-05 | 233099.6 | 67787.28 | 3.438693 |
1.00E-05 | 290422.3 | 79939.21 | 3.63304 |
为达到相同的收敛标准tanh需要的迭代次数约为sigmoid的2.57倍,如果迭代次数越多表明两个分类对象越相似。这组数据表明0和3这个两个分类对象相对tanh的对称性比sigmoid要强。Sigmoid加速了0和3对称性的破缺。
比较平均分类准确率pave
平均准确率p-ave | 平均准确率p-ave | ||
δ | tanh | sig | tanh/sig |
0.01 | 0.985069 | 0.985134 | 0.999933 |
0.001 | 0.987717 | 0.986884 | 1.000844 |
9.00E-04 | 0.987957 | 0.987265 | 1.000701 |
8.00E-04 | 0.987493 | 0.98746 | 1.000033 |
7.00E-04 | 0.987526 | 0.988023 | 0.999497 |
6.00E-04 | 0.986488 | 0.988137 | 0.998331 |
5.00E-04 | 0.987076 | 0.988031 | 0.999034 |
4.00E-04 | 0.986882 | 0.987397 | 0.999478 |
3.00E-04 | 0.987662 | 0.986793 | 1.00088 |
2.00E-04 | 0.989281 | 0.986586 | 1.002731 |
1.00E-04 | 0.991109 | 0.987493 | 1.003662 |
9.00E-05 | 0.99124 | 0.987106 | 1.004188 |
8.00E-05 | 0.991139 | 0.987473 | 1.003713 |
7.00E-05 | 0.991619 | 0.987897 | 1.003768 |
6.00E-05 | 0.991806 | 0.988366 | 1.00348 |
5.00E-05 | 0.992121 | 0.988556 | 1.003607 |
4.00E-05 | 0.992344 | 0.98944 | 1.002935 |
3.00E-05 | 0.992558 | 0.989667 | 1.002922 |
2.00E-05 | 0.993018 | 0.990753 | 1.002286 |
1.00E-05 | 0.993541 | 0.991601 | 1.001956 |
当δ<1e-4以后tanh的pave显著的大于sigmoid的pave。
比较等收敛标准下的最大分辨准确率pmax
最大准确率p-max | 最大准确率p-max | |
tanh | sig | δ |
0.975879 | 0.984925 | 0.1 |
0.98995 | 0.986935 | 0.01 |
0.991457 | 0.990452 | 0.001 |
0.990955 | 0.990452 | 9.00E-04 |
0.991457 | 0.990452 | 8.00E-04 |
0.991457 | 0.990452 | 7.00E-04 |
0.991457 | 0.990452 | 6.00E-04 |
0.99196 | 0.990452 | 5.00E-04 |
0.99196 | 0.990452 | 4.00E-04 |
0.993467 | 0.989447 | 3.00E-04 |
0.993467 | 0.989447 | 2.00E-04 |
0.995477 | 0.991457 | 1.00E-04 |
0.994472 | 0.991457 | 9.00E-05 |
0.994975 | 0.991457 | 8.00E-05 |
0.994975 | 0.99196 | 7.00E-05 |
0.994975 | 0.992462 | 6.00E-05 |
0.994975 | 0.992462 | 5.00E-05 |
0.995477 | 0.992965 | 4.00E-05 |
0.994975 | 0.993467 | 3.00E-05 |
0.995477 | 0.99397 | 2.00E-05 |
0.996482 | 0.994975 | 1.00E-05 |
这个结果很明显,当δ<0.01以后tanh的pmax都大于sigmoid的pmax
因此综合上述三组数据可以得出,当δ相同的情况下,tanh的平均性能和最大性能都要显著的好于sigmoid,但是tanh为之付出的迭代次数也显著的大于sigmoid 。
因此从收敛效率上比较,哪个函数更好些?
迭代次数n | 平均准确率p-ave | δ | 耗时ms/次 | 耗时ms/199次 | 耗时 min/199 | 最大准确率p-max | |
tanh | 161331.8392 | 0.992121 | 5.00E-05 | 1797.372 | 357682 | 5.961367 | 0.994975 |
205468.6884 | 0.992558 | 3.00E-05 | 2290.075 | 455731 | 7.595517 | 0.994975 | |
233099.6432 | 0.993018 | 2.00E-05 | 2604.116 | 518227 | 8.637117 | 0.995477 | |
sigmoid | 86061.19598 | 0.992119 | 7.00E-06 | 681.9698 | 135728 | 2.262133 | 0.994975 |
94757.74372 | 0.992525 | 4.00E-06 | 742.6985 | 147803 | 2.463383 | 0.994975 | |
117696.2714 | 0.993195 | 1.00E-06 | 511.6231 | 101821 | 1.697017 | 0.996482 | |
tanh/sig | 1.874617676 | 1.000003 |
| 2.635559 | 2.635285 | 2.635285 | 1 |
2.168357755 | 1.000033 |
| 3.083452 | 3.083368 | 3.083368 | 1 | |
1.980518504 | 0.999822 |
| 5.08991 | 5.089589 | 5.089589 | 0.998991 |
从表格中分别挑出了三组值,这对应的三组值的pave相当,用这三组数据比较tanh 和sigmoid达到相同性能的效率差异。
比如第一组值pave=0.9921 ,tanh和sigmoid分别用了161331次和86061次迭代,tanh是sigmoid的1.87倍,耗时tanh是sigmoid 的2.63倍。
这三组数据表明sigmoid达到相同的性能需要的迭代次数要比tanh要少,耗时也少,表明sigmoid 的收敛效率要高的多。
因此比较这两个函数的性能
在收敛标准相同的前提下,tanh的平均性能要好于sigmoid
在迭代次数相同的前提下,sigmoid的平均性能要好于tanh
在目标性能一致的前提下,sigmoid的收敛效率显著的高于tanh