7结论
“7 Discussion” (Krizhevsky 等, 2017, p. 8)
“Our results show that a large, deep convolutional neural network is capable of achieving recordbreaking results on a highly challenging dataset using purely supervised learning. It is notable that our network’s performance degrades if a single convolutional layer is removed. For example, removing any of the middle layers results in a loss of about 2% for the top-1 performance of the network. So the depth really is important for achieving our results.” (Krizhevsky 等, 2017, p. 8) 我们的结果表明,一个大型的深度卷积神经网络能够在一个极具挑战性的数据集上使用纯监督学习实现破纪录的结果。值得注意的是,如果去除单个卷积层,我们的网络性能会下降。例如,删除任何中间层都会导致网络的top - 1性能损失约2 %。因此,深度对于取得我们的成果确实很重要。
“To simplify our experiments, we did not use any unsupervised pre-training even though we expect that it will help, especially if we obtain enough computational power to significantly increase the size of the network without obtaining a corresponding increase in the amount of labeled data. Thus far, our results have improved as we have made our network larger and trained it longer but we still have many orders of magnitude to go in order to match the infero-temporal pathway of the human visual system. Ultimately we would like to use very large and deep convolutional nets on video sequences where the temporal structure provides very helpful information that is missing or far less obvious in static images.” (Krizhevsky 等, 2017, p. 8) 为了简化实验,我们没有使用任何无监督的预训练,尽管我们预计这将会有帮助,特别是如果我们获得足够的计算能力来显著增加网络的规模,而没有获得相应的标记数据量的增加。到目前为止,我们的结果已经有所改善,因为我们已经使我们的网络变得更大,并且训练了更长的时间,但是为了匹配人类视觉系统的时间内路径,我们还有很多数量级的时间。最终,我们希望在视频序列上使用非常大和深度的卷积网络,其中时间结构提供了非常有用的信息,而这些信息在静态图像中缺失或不明显。
解读
(1)神经网络的深度很重要,删除中间层会使top-1性能损失约2%
(2)网络是在无预训练的情况下进行的
(3)希望在视频上使用更大的神经网络