动手学深度学习v2课后习题线性回归的从零开始实现基于pytorch

本文链接：https://blog.csdn.net/monica1232/article/details/121580826

1.如果我们将权重初始化为零，会发生什么。算法仍然有效吗？
2.假设你是乔治·西蒙·欧姆，试图为电压和电流的关系建立一个模型。你能使用自动微分来学习模型的参数吗?
3.您能基于普朗克定律使用光谱能量密度来确定物体的温度吗？
4.如果你想计算二阶导数可能会遇到什么问题？你会如何解决这些问题？
5.为什么在 squared_loss 函数中需要使用 reshape 函数？
6.尝试使用不同的学习率，观察损失函数值下降的快慢。
7.如果样本个数不能被批量大小整除，data_iter函数的行为会有什么变化？

1.如果我们将权重初始化为零，会发生什么。算法仍然有效吗？
这是原来的

epoch 1, loss 0.048973
epoch 2, loss 0.000205
epoch 3, loss 0.000053

化为零后是这样的

w = torch.zeros((2,1) ,requires_grad=True)#换为这个

epoch 1, loss 0.032329
epoch 2, loss 0.000108
epoch 3, loss 0.000052

epoch 1, loss 0.040064
epoch 2, loss 0.000152
epoch 3, loss 0.000052
我用了两次都比原来好

详细的可以看这篇讲为什么可以
2.假设你是乔治·西蒙·欧姆，试图为电压和电流的关系建立一个模型。你能使用自动微分来学习模型的参数吗?

可以自己打下试一下，就是改一下

import random
import torch


def sythetic_data(r,b,number_examples): #这里的b可以看作为了测试线性回归硬加的，当然你可以看做其他因素对电阻的影响
    I=torch.rand(number_examples,1)
    u=I*r.T+b
    return I,u.reshape((1,-1))

true_r=torch.tensor([2])
true_b=0.01
features,labels=sythetic_data(true_r,true_b,1000)

def data_iter(batch_size,features,labels):
    num_examples = len(features)
    indices =list(range(num_examples))
    random.shuffle(indices)
    for i in range(0,num_examples,batch_size):
        batch_indices=torch.tensor(indices[i:min(i+batch_size,num_examples)])
        yield features[batch_indices],labels[0,batch_indices]

batch_size =10
for I,u in data_iter(batch_size,features,labels):
    print(I,'\n',u)
    break
'''
tensor([[0.1557],
        [0.0186],
        [0.9364],
        [0.6862],
        [0.5516],
        [0.6006],
        [0.5371],
        [0.5926],
        [0.7214],
        [0.6568]]) 
tensor([0.3214, 0.0472, 1.8827, 1.3824, 1.1132, 1.2112, 1.0842, 1.1953, 1.4528,
        1.3237])
'''

r = torch.zeros((1,1) ,requires_grad=True)
b = torch.zeros(1, requires_grad=True)

def linreg(I, r, b):  
    '''线性回归模型。'''
    return I*r.T+b

def squared_loss(u_hat, u):
    '''均方损失。'''
    return (u_hat - u.reshape(u_hat.shape)) ** 2 / 2


def sgd(params, lr, batch_size):  
    '''小批量随机梯度下降。'''
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()

lr = 0.01
num_epochs = 3
net = linreg
loss = squared_loss

for epoch in range(num_epochs):
    for I, u in data_iter(batch_size, features, labels):
        l = loss(net(I, r, b), u)  
        l.sum().backward()
        sgd([r, b], lr, batch_size)  # 使用参数的梯度更新参数
    with torch.no_grad():
        train_l = loss(net(features, r, b), labels)
        print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')
'''
epoch 1, loss 0.134057
epoch 2, loss 0.080569
epoch 3, loss 0.067660
'''

print(f'w的估计误差: {true_r - r.reshape(true_r.shape)}')
print(f'b的估计误差: {true_b - b}')
'''
w的估计误差: tensor([1.2688], grad_fn=<SubBackward0>)
b的估计误差: tensor([-0.6739], grad_fn=<RsubBackward1>)
'''

5.为什么在 squared_loss 函数中需要使用 reshape 函数？
防止一个是列向量一个是行向量
6.尝试使用不同的学习率，观察损失函数值下降的快慢。
可以自己试试，教程给的参考教程

lr=0.01
epoch 1, loss 0.039205
epoch 2, loss 0.000149
epoch 3, loss 0.000048

lr=0.05
epoch 1, loss 0.053974
epoch 2, loss 0.027887
epoch 3, loss 0.014440

lr = 0.1
epoch 1, loss 0.026897
epoch 2, loss 0.006844
epoch 3, loss 0.001785

7.如果样本个数不能被批量大小整除，data_iter函数的行为会有什么变化？
相当于100/9 最后一份只有1个，但是不会出错。在这里插入图片描述
这是因为在data_iter函数里面的min的作用，如果换成indices[i:i+batch_size]就会出错。我把原来的break去掉了，然后把batch_size设置为100，simple的数原来为1000.这样下去出错了