wangshusen学习笔记

最新推荐文章于 2024-11-14 16:23:37 发布

王小燊oom

最新推荐文章于 2024-11-14 16:23:37 发布

阅读量297

点赞数

分类专栏： LLM 文章标签：学习笔记

本文链接：https://blog.csdn.net/Ives_WangShen/article/details/132852305

版权

6 篇文章 0 订阅

订阅专栏

Keywords:

Parameters:参数 -> 权重 weights(卷积核、参数矩阵…)
Hyper-parameters:超参 -> 搭建神经网络、开始训练之前需要设置的数值；包括 Architecture（网络结构：多少个卷积核、每个卷积核多大、stride大小等）、Algorithm（lr、epochs…）
hyper-parameters + train data = parameters -> accuracy

定义：找到可以得到准确率最高的神经网络结构(或者其他指标更好：比如效率、资源利用…)
eg: ResNet has better accuracy than VGG
MobileNet is more efficient than ResNet but less accuracy.

超参组合总数(the set containing all the possible architectures)

如果用上述超参搭建20个卷积层可能的组合有（4 * 3 * 2) ^ 20 = 4 x 10 ^ 27种

	layer1	layer2	…	layer20
# of filters	24	48	…	64
size of filters	5x5	3x3	…	3x3
stride	1	1	…	2

随机设置超参 --train–> CNN model --evaluate–> val acc
重复该过程多次，选择其中val acc 最好的超参配置
这种方法叫：cross validation 交叉验证

难点：1. 每次尝试消耗巨大；2. search space巨大，尝试数量太小，不容易找到特别好的结构；

通过RNN获得CNN Architectures
由于该controller RNN的训练过程不可微，用强化学习训练RNN

难度：计算量巨大 – 需要每次从0开始训练CNN，最后用CNN的val 作为奖励来训练controller RNN

MapReduce 架构（google 不开源）：同步，cs架构，每个worker全部完成工作后才会进行下一轮；更多用于大数据处理
– apache hadoop（开源）
– apache spark (比hadoop快很多)
Parameter Server：异步，cs架构；Ray（推荐的开源系统，better than spark）；要求所有work必须比较稳定
Decentralized Network（去中心化网络）：peer to peer，