动手学习深度学习_笔记1

sinat_28054577

于 2020-02-14 17:28:38 发布

阅读量146

点赞数

本文链接：https://blog.csdn.net/sinat_28054577/article/details/104305163

版权

这是《动手学深度学习》14天公益课程的笔记。希望能坚持下去，好好学习。

1.1 线性回归

线性回归假设输出与输入之间是线性关系。
使用线性模型来生成数据集
$y=\omega_1x_1+\omega_2x_2+b$
$\omega$ 是权重， $b$ 是偏差，是单个变量。

#特征数
num_inputs=2
#样本数
num_examples=1000

#设置真实的权重以及偏差
true_w=[2.5,-1.8]
true_b=2.1

features=torch.randn(num_examples,num_inputs,dtype=torch.float32)
labels=true_w[0]*features[:,0]+true_w[1]*features[:,1]+true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()),
                       dtype=torch.float32)

定义模型

def linreg(X, w, b):
    return torch.mm(X, w) + b

损失函数用于衡量预测值与真实值之间的误差，常用平方函数
$l^{(i)}(\omega,b)=\frac{1}{2}(\hat{y}^{(i)}-y^{(i)})^2$

def squared_loss(y_hat, y): 
    return (y_hat - y.view(y_hat.size())) ** 2 / 2

大多数深度学习模型无解析解，使用优化算法降低损失函数的值，得到数值解。例如使用小批量随机梯度下降：先选取参数的初始值，在负梯度方向上迭代更新参数。在每次迭代中随机选取小批量训练样本，求出这些样本的平均损失关于模型参数的导数(梯度)，用此结果与一个设定的正数的乘积作为减少量。

def sgd(params, lr, batch_size): 
    for param in params:
        param.data -= lr * param.grad / batch_size # ues .data to operate param without gradient track
        #param.grad指学习率(步长大小)

模型训练

lr = 0.03
num_epochs = 5

net = linreg
loss = squared_loss

# training
for epoch in range(num_epochs):  # training repeats num_epochs times
    # in each epoch, all the samples in dataset will be used once
    
    # X is the feature and y is the label of a batch sample
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y).sum()  
        # calculate the gradient of batch sample loss 
        l.backward()  
        # using small batch random gradient descent to iter model parameters
        sgd([w, b], lr, batch_size)  
        # reset parameter gradient
        w.grad.data.zero_()
        b.grad.data.zero_()
    train_l = loss(net(features, w, b), labels)
    #最后得到的权重是[ 2.4999],[-1.8002]，偏差2.1004

使用pyTorch 定义模型

class LinearNet(nn.Module):
    def __init__(self, n_feature):
        super(LinearNet, self).__init__()      # call father function to init 
        self.linear = nn.Linear(n_feature, 1)  # function prototype: `torch.nn.Linear(in_features, out_features, bias=True)`

    def forward(self, x):
        y = self.linear(x)
        return y
    
net = LinearNet(num_inputs)
# ways to init a multilayer network
# method one
net = nn.Sequential(
    nn.Linear(num_inputs, 1)
    # other layers can be added here
    )

# method two
#直接调用神经网络的Sequential函数
net = nn.Sequential()
net.add_module('linear', nn.Linear(num_inputs, 1))
# net.add_module ......

# method three
from collections import OrderedDict
net = nn.Sequential(OrderedDict([
          ('linear', nn.Linear(num_inputs, 1))
          # ......
        ]))

直接调用nn的均方误差函数

loss = nn.MSELoss()

1.2 Softmax和分类模型

Softmax回归是单层神经网络，用于离散分类。它的输出层是一个全连接层。
$o_i=x\omega_{i}+b_{i}$
softmax运算符将输出值变换为正且和为1的概率分布
$\hat{y}_1,\hat{y}_2,\hat{y}_3=\text{softmax}(o_1,o_2,o_3)$
其中 $\hat{y}_j=\frac{exp(o_j)}{\sum_{i=1}^3exp(o_i)}$
softmax运算符不改变预测类别的输出。

def softmax(X):
    X_exp = X.exp()
    partition = X_exp.sum(dim=1, keepdim=True)
    return X_exp / partition  # 这里应用了广播机制

def net(X):
    return softmax(torch.mm(X.view((-1, num_inputs)), W) + b)

交叉熵损失函数更适合衡量两个概率分布差异。交叉熵
$H(y^{(i)},\hat{y}^{(i)})=-\sum_{j=1}^{q}y_j^{(i)}log\hat{y}^{(i)}_j$
交叉熵损失函数就是取均值

def cross_entropy(y_hat, y):
    return - torch.log(y_hat.gather(1, y.view(-1, 1)))

模型训练

def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
              params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum()
            
            # 梯度清零
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()
            
            l.backward()
            if optimizer is None:
                d2l.sgd(params, lr, batch_size)
            else:
                optimizer.step() 
            
            
            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))

train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)

1.3 多层感知机

假设多层感知机只有一个隐藏层，设输出为H。隐藏层与输出层都是全连接层，有对应的参数和偏差 $W_h,b_h,W_o,b_o$
输出的计算
$H=XW_h+b_h \\ O=HW_o+b_o$
将式子联立之后可以发现依然等价于一个单层神经网络。
解决方法是引入非线性变换，使隐藏层的输出与输出层输出呈非线性关系。这样的非线性函数称为激活函数。
常用的激活函数有
ReLu函数
$R e L U (x) = m a x (x, 0)$

def relu(X):
    return torch.max(input=X, other=torch.tensor(0.0))

只能在隐藏层中使用。由于计算较为简单，在层数较多时最好使用。
Sigmoid函数
$sigmoid(x)=\frac{1}{1+exp(-x)}$

模型训练

def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
       train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
         for X, y in train_iter:
             y_hat = net(X)
             l = loss(y_hat, y).sum()
             
             # 梯度清零
             if optimizer is not None:
                 optimizer.zero_grad()
             elif params is not None and params[0].grad is not None:
                 for param in params:
                     param.grad.data.zero_()
            
             l.backward()
             if optimizer is None:
                 d2l.sgd(params, lr, batch_size)
             else:
                optimizer.step()  
          
             
             train_l_sum += l.item()
             train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
             n += y.shape[0]
         test_acc = evaluate_accuracy(test_iter, net)