系统学习CV-lesson5
pytorch review
https://www.jianshu.com/p/26a7dbc15246(这个没看,感觉可以看)
具体见自己的笔记
其他参考
batch概念
https://www.zhihu.com/question/32673260/answer/71137399
https://zhuanlan.zhihu.com/p/64864995
https://zhuanlan.zhihu.com/p/148267858
batch_size影响模型的泛化能力
backward()
https://zhuanlan.zhihu.com/p/83172023
https://zhuanlan.zhihu.com/p/27808095
l
o
s
s
.
b
a
c
k
w
a
r
d
(
t
o
r
c
h
.
o
n
e
s
l
i
k
e
(
l
o
s
s
s
)
)
等
价
于
l
o
s
s
.
s
u
m
(
)
.
b
a
c
k
w
a
r
d
(
)
loss.backward(torch.ones_like(losss)) 等价于loss.sum().backward()
loss.backward(torch.oneslike(losss))等价于loss.sum().backward() 求的依然是每个参数的梯度
https://zhuanlan.zhihu.com/p/65609544(重要)
https://zhuanlan.zhihu.com/p/29923090(扩展,还没看)
矩阵求导
https://blog.csdn.net/mounty_fsc/article/details/51588794
.data()和.detach()
https://blog.csdn.net/u013289254/article/details/102557070
https://www.cnblogs.com/wanghui-garcia/p/10677071.html(重要)(data修改不报错,detach修改报错不修改就相当于分离)
http://www.bnikolic.co.uk/blog/pytorch-detach.html(重要)
https://blog.csdn.net/qq_34430032/article/details/108106649?ops_request_misc=&request_id=&biz_id=102&utm_term=detach&utm_medium=distribute.pc_search_result.none-task-blog-2allsobaiduweb~default-5-.first_rank_v2_pc_rank_v29&spm=1018.2226.3001.4187(重要)
https://blog.csdn.net/qq_27825451/article/details/96837905(重要)
crossEntropyLoss
https://blog.csdn.net/c2250645962/article/details/106014693/
corssEntropyLoss=LogSoftmax()+NLLLoss
???不能使用one-hot编码???
dropout
https://zhuanlan.zhihu.com/p/38200980(对dropout过程的解释)
https://tangshusen.me/Dive-into-DL-PyTorch/#/chapter03_DL-basics/3.13_dropout(对期望不变的解释)(神经元是值,mask是随机变量,遵循伯努利分布)
优化方法和optimizer
https://mfy.world/deep-learning/pytorch/pytorchnotes-optimizer/
http://chenhao.space/post/e03223e1.html#%E5%BC%95%E8%A8%80
https://zhuanlan.zhihu.com/p/43506482
https://blog.csdn.net/tsyccnh/article/details/76673073
https://blog.csdn.net/u012759136/article/details/52302426
https://blog.csdn.net/u012328159/article/details/80311892
https://github.com/alphadl/lookahead.pytorch
https://www.cnblogs.com/shiliuxinya/p/12261966.html
https://zhuanlan.zhihu.com/p/32230623
https://zhuanlan.zhihu.com/p/91166049
parameters group实在创建optimizer时定义的使用不同更新策略的参数组
https://blog.csdn.net/qyhaill/article/details/103043637
learning rate(没看)
https://blog.csdn.net/ys1305/article/details/94332643
https://blog.csdn.net/weixin_43722026/article/details/103271611
https://blog.csdn.net/chaipp0607/article/details/112986446
https://zhuanlan.zhihu.com/p/261134624?utm_source=wechat_session
https://zhuanlan.zhihu.com/p/69411064
https://zhuanlan.zhihu.com/p/64864995
learning_rate影响模型的收敛状态
通常当我们增加batchsize为原来的N倍时,要保证经过同样的样本后更新的权重相等,按照线性缩放规则,学习率应该增加为原来的N倍[5]。但是如果要保证权重的方差不变,则学习率应该增加为原来的sqrt(N)倍[7],目前这两种策略都被研究过,使用前者的明显居多
warmup
https://www.cnblogs.com/shona/p/12252940.html
模型构建
https://zhuanlan.zhihu.com/p/75206669
save load
https://www.jianshu.com/p/60fc57e19615
初始化
https://blog.csdn.net/ys1305/article/details/94332007
https://blog.csdn.net/qq_21578849/article/details/85028333
进阶
self.modules self.children
https://blog.csdn.net/dss_dssssd/article/details/83958518
https://zhuanlan.zhihu.com/p/168787113
review reshape
https://blog.csdn.net/HuanCaoO/article/details/104794075/
permute transpose
https://blog.csdn.net/CCSUXWZ/article/details/111429771
GPU cpu
# 当使用CPU时,跳过GPU相关指令
if device != torch.device("cpu"):
torch.cuda.synchronize(device)
指定GPU
使用torch.device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
gpu命令相关
https://blog.csdn.net/qq_36955294/article/details/107410093
torch.cuda.empty_cache()
buffer parameter (没完全看懂)
https://zhuanlan.zhihu.com/p/89442276
可重复性和运行效率 cudnn.benchmark
https://www.cnblogs.com/huanxifan/p/12625036.html
https://zhuanlan.zhihu.com/p/73711222
https://blog.csdn.net/qq_36450004/article/details/106003122
问题
- hook
- optimizer groups
- nn.ParameterList